This document describes the provisioning models for Compute Engine instances.
When you create an instance, you must define the method, called provisioning model , that you want to use to obtain your requested resources. Each provisioning model determines the availability, lifespan, and pricing of your instances. By understanding these models, you can choose the best option for your workload.
Available provisioning models
When you create a compute instance, you can specify one of the following provisioning models. If you don't specify a provisioning model, then Compute Engine uses the standard provisioning model by default.
-
Standard
-
Spot
-
Flex-start
-
Reservation-bound
The following table helps you compare the use cases and pricing for each provisioning model:
- Based on resource availability, you can immediately create instances.
- You can control when to stop or delete instances.
- Based on resource availability, you can immediately create instances.
- You can control when to stop or delete instances. However, you also allow Compute Engine to stop or delete instances at any time to reclaim capacity.
- Based on resource availability, you can create instances within a specified waiting time. For a standalone instance, you can specify a waiting time of up to two hours. For a MIG resize request, the waiting time is indefinite.
- You can control when to stop or delete instances. However, you can't suspend or recreate them. Instances run for a minimum of 10 minutes up to a maximum of seven days. When the instances reach the end of their run duration, Compute Engine stops or deletes the instances based on their termination action.
- You can request to reserve capacity at a future date for creating instances with GPUs attached. If Google Cloud approves your request, then Compute Engine creates a reservation. At the start of the reservation period, you can consume the reservation by creating GPU instances that match the reservation.
- During the approved reservation period, you can stop, restart, delete, and recreate instances to consume the reservation as needed. When the reservation period ends, Compute Engine deletes the reservation, and stops or deletes any instances that consume the reservation based on their termination action.
Ideal for workloads that require stability and continuous operation, such as the following workloads:
- Web servers
- Databases
- Enterprise applications
- Development and testing
Ideal for workloads that can tolerate interruptions, such as the following workloads:
- Batch processing
- High performance computing (HPC)
- Continuous integration and continuous deployment (CI/CD)
- Data analytics
- Media encoding
- Online inference
Workloads that require stability and need to run for no more than seven days, such as the following workloads:
- Small model pre-training
- Model fine-tuning
- HPC simulation
- Batch inference
Ideal for workloads that require stability and a specific run time, such as the following:
-
For workloads that last up to 90 days:
- Model pre-training jobs
- Model fine-tuning jobs
- HPC simulation workloads
- Short-term expected increases in inference workloads
-
For workloads longer than 90 days:
- Training workloads
- Inference workloads
- You incur standard pricing for instances. See VM instance pricing .
-
You incur charges based on the method that you use to create instances:
- If you immediately create instances, then you pay as you go (PAYG).
- If you create instances by using an on-demand reservation or an auto-created reservation for a future reservation, then you're charged until the reservation exists. For more information, see reservations billing .
- You get discounts up to 91% off for many machine types, GPUs, TPUs, and Local SSD disks. For more information, see Spot VMs pricing .
- You PAYG.
-
Based on the machine series that your instances use, you get a discount as follows:
- For A4, A3, and A2 machine series, you get a 53% discount for vCPUs, memory, and GPUs.
- For H4D machine series, you get a 25% discount for vCPUs and memory.
- You PAYG.
-
You incur charges based on how you reserve capacity for creating instances as follows:
- If you reserve capacity in AI Hypercomputer, then you incur charges based on accelerator-optimized VMs pricing . If you reserve resources for a year or longer, then you must purchase and attach a resource-based commitment to your reserved resources.
- If you reserve capacity by using future reservations in calendar mode, then you incur charges based on the Dynamic Workload Scheduler (DWS) pricing .
- You're charged for the reservation period. For more information, see reservations billing .
Instance availability and lifespan
The following table shows you the compute instances availability and lifespan for each provisioning model:
To create instances, you must first reserve capacity using one of the following methods:
- To reserve capacity for long-running workloads, use future reservations in AI Hypercomputer .
- To reserve capacity for workloads that run for up to 90 days, use future reservations in calendar mode .
At your chosen delivery date and time, Compute Engine provisions your requested capacity. Then, you can consume the capacity by creating instances.
You can only use the following machine series:
Based on how you reserve capacity to create VMs, you can only use the following machine series:
- If you reserve capacity in AI Hypercomputer , then you can only use A4X, A4, or A3 Ultra machine series.
- If you create a future reservation in calendar mode , then you can only use A4, A3 Ultra, A3 Mega, or A3 High with 8 GPUs machine series.
You can create instances as follows:
Compute Engine uses DWS to schedule the provisioning of your requested capacity based on resource availability. DWS helps you obtain high-demand resources like GPUs.
- If you immediately create instances, then Compute Engine makes best-effort attempts to provision your requested capacity.
- If you create instances by consuming an on-demand reservation or an auto-created reservation for a future reservation, then you have very high assurance that Compute Engine provisions your requested capacity if the reservation has reserved capacity available.
You can control when to stop or delete an instance, except in the following cases:
- Compute Engine stops or deletes the instance to reclaim capacity. This process is called preemption .
- If the machine type that the instance uses doesn't support live migration, then Compute Engine stops the instance during host maintenance events .
Before an instance reaches the end of its run duration, you can do the following:
- Stop the instance: you can stop the instance at any time only if it was created as a standalone instance.
- Delete the instance: you can delete the instance at any time.
When an instance reaches the end of its run duration, Compute Engine deletes it.
You can control when to stop or delete an instance, except in the following cases:
- Compute Engine stops the instance during host maintenance events .
- The automatically created reservation to provision your requested capacity reaches the end of its committed reservation period. At that time, Compute Engine deletes the reservation, and stops or deletes any instances that consume the reservation based on the termination action that is specified in their configuration.
Provisioning models for dense deployments
To deploy high performance computing (HPC), artificial intelligence (AI), and machine learning (ML) workloads on Google Cloud, the compute resources must be physically close to each other to minimize network hops and optimize for the lowest latency. Compute Engine provides provisioning methods that let you reserve tightly coupled groups of hosts interconnected by a high-speed network fabric within a single data center.
For more information about provisioning methods for dense deployments, see the following:
- H4D instances: Overview of HPC cluster creation
- GPU instances: Capacity overview in the AI Hypercomputer documentation.
What's next
-
Learn more about Spot VMs .
-
Learn more about Flex-start VMs .
-
Learn more about VMs that use the reservation-bound provisioning model .

