This document outlines the NVIDIA GPU models available on Compute Engine, which you can use to accelerate machine learning (ML), data processing, and graphics-intensive workloads on your virtual machine (VM) instances. This document also details which GPUs come pre-attached to accelerator-optimized machine series such as A4X, A4, A3, A2, G4, and G2, and which GPUs you can attach to N1 general-purpose instances.
Use this document to compare the performance, memory, and features of different GPU models. For a more detailed overview of the accelerator-optimized machine family, including information on CPU platforms, storage options, and networking capabilities, and to find the specific machine type that matches your workload, see Accelerator-optimized machine family .
For more information about GPUs on Compute Engine, see About GPUs .
To view available regions and zones for GPUs on Compute Engine, see GPUs regions and zone availability .
GPU machine types
Compute Engine offers different machine types to support your various workloads.
Some machine types support NVIDIA RTX Virtual Workstations (vWS) . When you create an instance that uses NVIDIA RTX Virtual Workstation, Compute Engine automatically adds a vWS license. For information about pricing for virtual workstations, see the GPU pricing page .
The later generation A series are ideal for pre-training and fine-tuning foundation models that involves large clusters of accelerators, while the A2 series can be used for training smaller models and single host inference.
For these machine types, the GPU model is automatically attached to the instance.
The G series can also be used for training smaller models and for single-host inference.
For these machine types, the GPU model is automatically attached to the instance.
For N1 general-purpose machine types, except for the N1 shared-core
    ( f1-micro 
and g1-small 
), you can attach a select
    set of GPU models. Some of these GPU models also support NVIDIA RTX Virtual
    Workstations (vWS).
-  A4X 
(NVIDIA GB200 Superchips)
 (nvidia-gb200)
-  A4 
(NVIDIA B200)
 (nvidia-b200)
-  A3 Ultra 
(NVIDIA H200)
 (nvidia-h200-141gb)
-  A3 Mega 
(NVIDIA H100)
 (nvidia-h100-mega-80gb)
-  A3 High 
(NVIDIA H100)
 (nvidia-h100-80gb)
-  A3 Edge 
(NVIDIA H100)
 (nvidia-h100-80gb)
-  A2 Ultra 
(NVIDIA A100 80GB)
 (nvidia-a100-80gb)
-  A2 Standard 
(NVIDIA A100)
 (nvidia-a100-40gb)
- NVIDIA T4
 (nvidia-tesla-t4)
 (nvidia-tesla-t4-vws)
- NVIDIA P4
 (nvidia-tesla-p4)
 (nvidia-tesla-p4-vws)
- NVIDIA V100
 (nvidia-tesla-v100)
- NVIDIA P100
 (nvidia-tesla-p100)
 (nvidia-tesla-p100-vws)
You can also use some GPU machine types on AI Hypercomputer . AI Hypercomputer is a supercomputing system that is optimized to support your artificial intelligence (AI) and machine learning (ML) workloads. This option is recommended for creating a densely allocated, performance-optimized infrastructure that has integrations for Google Kubernetes Engine (GKE) and Slurm schedulers.
A4X machine series
 A4X accelerator-optimized 
machine types use NVIDIA GB200 Grace Blackwell Superchips ( nvidia-gb200 
) and
  are ideal for foundation model training and serving.
A4X is an exascale platform based on NVIDIA GB200 NVL72 . Each machine has two sockets with NVIDIA Grace CPUs with Arm Neoverse V2 cores. These CPUs are connected to four NVIDIA B200 Blackwell GPUs with fast chip-to-chip ( NVLink-C2C ) communication.
| Machine type | vCPU count 1 | Instance memory (GB) | Attached Local SSD (GiB) | Physical NIC count | Maximum network bandwidth (Gbps) 2 | GPU count | GPU memory 3 (GB HBM3e) | 
|---|---|---|---|---|---|---|---|
| a4x-highgpu-4g | 140 | 884 | 12,000 | 6 | 2,000 | 4 | 720 | 
 1 
A vCPU is implemented as a single hardware hyper-thread on one of
the available CPU platforms 
.
 2 
Maximum egress bandwidth cannot exceed the number given. Actual
egress bandwidth depends on the destination IP address and other factors.
For more information about network bandwidth,
see Network bandwidth 
.
 3 
GPU memory is the memory on a GPU device that can be used for
temporary storage of data. It is separate from the instance's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
A4 machine series
 A4 accelerator-optimized 
machine types have NVIDIA B200 Blackwell GPUs 
( nvidia-b200 
) attached and are ideal for foundation model
training and serving.
| Machine type | vCPU count 1 | Instance memory (GB) | Attached Local SSD (GiB) | Physical NIC count | Maximum network bandwidth (Gbps) 2 | GPU count | GPU memory 3 (GB HBM3e) | 
|---|---|---|---|---|---|---|---|
| a4-highgpu-8g | 224 | 3,968 | 12,000 | 10 | 3,600 | 8 | 1,440 | 
 1 
A vCPU is implemented as a single hardware hyper-thread on one of
the available CPU platforms 
.
 2 
Maximum egress bandwidth cannot exceed the number given. Actual
egress bandwidth depends on the destination IP address and other factors.
For more information about network bandwidth, see Network bandwidth 
.
 3 
GPU memory is the memory on a GPU device that can be used for
temporary storage of data. It is separate from the instance's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
A3 machine series
A3 accelerator-optimized machine types have NVIDIA H100 SXM or NVIDIA H200 SXM GPUs attached.
A3 Ultra machine type
 A3 Ultra 
machine types have NVIDIA H200 SXM GPUs 
( nvidia-h200-141gb 
) attached and provides the highest network
performance in the A3 series. A3 Ultra machine types are ideal for foundation model training and
serving.
| Machine type | vCPU count 1 | Instance memory (GB) | Attached Local SSD (GiB) | Physical NIC count | Maximum network bandwidth (Gbps) 2 | GPU count | GPU memory 3 (GB HBM3e) | 
|---|---|---|---|---|---|---|---|
| a3-ultragpu-8g | 224 | 2,952 | 12,000 | 10 | 3,600 | 8 | 1128 | 
 1 
A vCPU is implemented as a single hardware hyper-thread on one of
the available CPU platforms 
.
 2 
Maximum egress bandwidth cannot exceed the number given. Actual
egress bandwidth depends on the destination IP address and other factors.
For more information about network bandwidth,
see Network bandwidth 
.
 3 
GPU memory is the memory on a GPU device that can be used for
temporary storage of data. It is separate from the instance's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
A3 Mega, High, and Edge machine types
To use NVIDIA H100 SXM GPUs , you have the following options:
-  A3 Mega 
: these
machine types have H100 SXM GPUs ( nvidia-h100-mega-80gb) and are ideal for large-scale training and serving workloads.
-  A3 High 
: these
machine types have H100 SXM GPUs ( nvidia-h100-80gb) and are well-suited for both training and serving tasks.
-  A3 Edge 
: these
 machine types have H100 SXM GPUs ( nvidia-h100-80gb), are designed specifically for serving, and are available in a limited set of regions .
A3 Mega
| Machine type | vCPU count 1 | Instance memory (GB) | Attached Local SSD (GiB) | Physical NIC count | Maximum network bandwidth (Gbps) 2 | GPU count | GPU memory 3 (GB HBM3) | 
|---|---|---|---|---|---|---|---|
| a3-megagpu-8g | 208 | 1,872 | 6,000 | 9 | 1,800 | 8 | 640 | 
A3 High
| Machine type | vCPU count 1 | Instance memory (GB) | Attached Local SSD (GiB) | Physical NIC count | Maximum network bandwidth (Gbps) 2 | GPU count | GPU memory 3 (GB HBM3) | 
|---|---|---|---|---|---|---|---|
| a3-highgpu-1g | 26 | 234 | 750 | 1 | 25 | 1 | 80 | 
| a3-highgpu-2g | 52 | 468 | 1,500 | 1 | 50 | 2 | 160 | 
| a3-highgpu-4g | 104 | 936 | 3,000 | 1 | 100 | 4 | 320 | 
| a3-highgpu-8g | 208 | 1,872 | 6,000 | 5 | 1,000 | 8 | 640 | 
A3 Edge
(GB HBM3)
a3-edgegpu-8g 
- 800: for asia-south1 and northamerica-northeast2
- 400: for all other A3 Edge regions
 1 
A vCPU is implemented as a single hardware hyper-thread on one of
the available CPU platforms 
.
 2 
Maximum egress bandwidth cannot exceed the number given. Actual
egress bandwidth depends on the destination IP address and other factors.
For more information about network bandwidth,
see Network bandwidth 
.
 3 
GPU memory is the memory on a GPU device that can be used for
temporary storage of data. It is separate from the instance's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
A2 machine series
A2 accelerator-optimized machine types have NVIDIA A100 GPUs attached and are ideal for model fine tuning, large model and cost optimized inference.
A2 machine series are available in two types:
-  A2 Ultra 
: these machine types have A100 80GB GPUs
( nvidia-a100-80gb) and Local SSD disks attached.
-  A2 Standard 
: these machine types have A100 40GB GPUs
( nvidia-tesla-a100) attached. You can also add Local SSD disks when creating an A2 Standard instance. For the number of disks you can attach, see Machine types that require you to choose a number of Local SSD disks .
A2 Ultra
| Machine type | vCPU count 1 | Instance memory (GB) | Attached Local SSD (GiB) | Maximum network bandwidth (Gbps) 2 | GPU count | GPU memory 3 (GB HBM2e) | 
|---|---|---|---|---|---|---|
| a2-ultragpu-1g | 12 | 170 | 375 | 24 | 1 | 80 | 
| a2-ultragpu-2g | 24 | 340 | 750 | 32 | 2 | 160 | 
| a2-ultragpu-4g | 48 | 680 | 1,500 | 50 | 4 | 320 | 
| a2-ultragpu-8g | 96 | 1,360 | 3,000 | 100 | 8 | 640 | 
A2 Standard
| Machine type | vCPU count 1 | Instance memory (GB) | Local SSD supported | Maximum network bandwidth (Gbps) 2 | GPU count | GPU memory 3 (GB HBM2) | 
|---|---|---|---|---|---|---|
| a2-highgpu-1g | 12 | 85 | Yes | 24 | 1 | 40 | 
| a2-highgpu-2g | 24 | 170 | Yes | 32 | 2 | 80 | 
| a2-highgpu-4g | 48 | 340 | Yes | 50 | 4 | 160 | 
| a2-highgpu-8g | 96 | 680 | Yes | 100 | 8 | 320 | 
| a2-megagpu-16g | 96 | 1,360 | Yes | 100 | 16 | 640 | 
 1 
A vCPU is implemented as a single hardware hyper-thread on one of
the available CPU platforms 
.
 2 
Maximum egress bandwidth cannot exceed the number given. Actual
egress bandwidth depends on the destination IP address and other factors.
For more information about network bandwidth,
see Network bandwidth 
.
 3 
GPU memory is the memory on a GPU device that can be used for
temporary storage of data. It is separate from the instance's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
G4 machine series
 G4 accelerator-optimized 
machine types use NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs 
( nvidia-rtx-pro-6000 
)
   and are
  suitable for NVIDIA Omniverse simulation workloads, graphics-intensive applications, video
  transcoding, and virtual desktops. G4 machine types also provide a low-cost solution for
  performing single host inference and model tuning compared with A series machine types.
A key feature of the G4 series is support for direct GPU peer-to-peer (P2P) communication
 on multi-GPU machine types ( g4-standard-96 
, g4-standard-192 
, g4-standard-384 
). This allows GPUs within the same instance to
  exchange data directly over the PCIe bus, without involving the CPU host. For more information about
  G4 GPU peer-to-peer communication, see G4 GPU peer-to-peer communication 
.
| Machine type | vCPU count 1 | Instance memory (GB) | Maximum Titanium SSD supported (GiB) 2 | Physical NIC count | Maximum network bandwidth (Gbps) 3 | GPU count | GPU memory 4 (GB GDDR7) | 
|---|---|---|---|---|---|---|---|
| g4-standard-48 | 48 | 180 | 1,500 | 1 | 50 | 1 | 96 | 
| g4-standard-96 | 96 | 360 | 3,000 | 1 | 100 | 2 | 192 | 
| g4-standard-192 | 192 | 720 | 6,000 | 1 | 200 | 4 | 384 | 
| g4-standard-384 | 384 | 1,440 | 12,000 | 2 | 400 | 8 | 768 | 
 1 
A vCPU is implemented as a single hardware hyper-thread on one of
the available CPU platforms 
.
 2 
You can add Titanium SSD disks when creating a G4 instance. For the number of disks
you can attach, see Machine types that require you to choose a number of Local SSD disks 
.
 3 
Maximum egress bandwidth cannot exceed the number given. Actual
egress bandwidth depends on the destination IP address and other factors.
See Network bandwidth 
.
 4 
GPU memory is the memory on a GPU device that can be used for
temporary storage of data. It is separate from the instance's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
G2 machine series
G2 accelerator-optimized machine types have NVIDIA L4 GPUs attached and are ideal for cost-optimized inference, graphics-intensive and high performance computing workloads.
Each G2 machine type also has a default memory and a custom memory range. The custom memory range defines the amount of memory that you can allocate to your instance for each machine type. You can also add Local SSD disks when creating a G2 instance. For the number of disks you can attach, see Machine types that require you to choose a number of Local SSD disks .
| Machine type | vCPU count 1 | Default instance memory (GB) | Custom instance memory range (GB) | Max Local SSD supported (GiB) | Maximum network bandwidth (Gbps) 2 | GPU count | GPU memory 3 (GB GDDR6) | 
|---|---|---|---|---|---|---|---|
| g2-standard-4 | 4 | 16 | 16 to 32 | 375 | 10 | 1 | 24 | 
| g2-standard-8 | 8 | 32 | 32 to 54 | 375 | 16 | 1 | 24 | 
| g2-standard-12 | 12 | 48 | 48 to 54 | 375 | 16 | 1 | 24 | 
| g2-standard-16 | 16 | 64 | 54 to 64 | 375 | 32 | 1 | 24 | 
| g2-standard-24 | 24 | 96 | 96 to 108 | 750 | 32 | 2 | 48 | 
| g2-standard-32 | 32 | 128 | 96 to 128 | 375 | 32 | 1 | 24 | 
| g2-standard-48 | 48 | 192 | 192 to 216 | 1,500 | 50 | 4 | 96 | 
| g2-standard-96 | 96 | 384 | 384 to 432 | 3,000 | 100 | 8 | 192 | 
 1 
A vCPU is implemented as a single hardware hyper-thread on one of
the available CPU platforms 
.
 2 
Maximum egress bandwidth cannot exceed the number given. Actual
egress bandwidth depends on the destination IP address and other factors.
For more information about network bandwidth,
see Network bandwidth 
.
 3 
GPU memory is the memory on a GPU device that can be used for
temporary storage of data. It is separate from the instance's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
N1 machine series
You can attach the following GPU models to an N1 machine type with the exception of the N1 shared-core machine types .
Unlike the machine types in the accelerator-optimized machine series, N1 machine types don't come with a set number of attached GPUs. Instead, you specify the number of GPUs to attach when creating the instance.
N1 instances with fewer GPUs limit the maximum number of vCPUs. In general, a higher number of GPUs lets you create instances with a higher number of vCPUs and memory.
N1+T4 GPUs
You can attach NVIDIA T4 GPUs to N1 general-purpose instances with the following instance configurations.
nvidia-tesla-t4 
ornvidia-tesla-t4-vws 
1 GPU memory is the memory available on a GPU device that you can use for temporary data storage. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
N1+P4 GPUs
You can attach NVIDIA P4 GPUs to N1 general-purpose instances with the following instance configurations.
nvidia-tesla-p4 
ornvidia-tesla-p4-vws 
 1 
GPU memory is the memory that is available on a GPU device
that you can use for temporary data storage. It is separate from the instance's
memory and is specifically designed to handle the higher bandwidth demands of
your graphics-intensive workloads.
 2 
For instances with attached NVIDIA P4 GPUs, Local SSD disks
are only supported in zones us-central1-c 
and northamerica-northeast1-b 
.
N1+V100 GPUs
You can attach NVIDIA V100 GPUs to N1 general-purpose instances with the following instance configurations.
nvidia-tesla-v100 
 1 
GPU memory is the memory available on a GPU device that you can use
for temporary data storage. It is separate from the instance's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
 2 
For instances with attached NVIDIA V100 GPUs, Local SSD disks
aren't supported in us-east1-c 
.
N1+P100 GPUs
You can attach NVIDIA P100 GPUs to N1 general-purpose instances with the following instance configurations.
For some NVIDIA P100 GPUs, the maximum CPU and memory available for some configurations depends on the zone in which the GPU resource runs.
nvidia-tesla-p100 
ornvidia-tesla-p100-vws 
us-east1-c 
,europe-west1-d 
,europe-west1-b 
1 GPU memory is the memory available on a GPU device that you can use for temporary data storage. It is separate from the instance's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
General comparison chart
The following table describes the GPU memory size, feature availability, and ideal workload types of different GPU models that are available on Compute Engine.
| GPU model | GPU memory | Interconnect | NVIDIA RTX Virtual Workstation (vWS) support | Best used for | 
|---|---|---|---|---|
|   
GB200 | 180 GB HBM3e @ 8 TBps | NVLink Full Mesh @ 1,800 GBps | Large-scale distributed training and inference of LLMs, Recommenders, HPC | |
|   
B200 | 180 GB HBM3e @ 8 TBps | NVLink Full Mesh @ 1,800 GBps | Large-scale distributed training and inference of LLMs, Recommenders, HPC | |
|   
H200 | 141 GB HBM3e @ 4.8 TBps | NVLink Full Mesh @ 900 GBps | Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM | |
|   
H100 | 80 GB HBM3 @ 3.35 TBps | NVLink Full Mesh @ 900 GBps | Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM | |
|   
A100 80GB | 80 GB HBM2e @ 1.9 TBps | NVLink Full Mesh @ 600 GBps | Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM | |
|   
A100 40GB | 40 GB HBM2 @ 1.6 TBps | NVLink Full Mesh @ 600 GBps | ML Training, Inference, HPC | |
|   
RTX PRO 6000 | 96 GB GDDR7 with ECC @ 1597 GBps | N/A | ML Inference, Training, Remote Visualization Workstations, Video Transcoding, HPC | |
|   
L4 | 24 GB GDDR6 @ 300 GBps | N/A | ML Inference, Training, Remote Visualization Workstations, Video Transcoding, HPC | |
|   
T4 | 16 GB GDDR6 @ 320 GBps | N/A | ML Inference, Training, Remote Visualization Workstations, Video Transcoding | |
|   
V100 | 16 GB HBM2 @ 900 GBps | NVLink Ring @ 300 GBps | ML Training, Inference, HPC | |
|   
P4 | 8 GB GDDR5 @ 192 GBps | N/A | Remote Visualization Workstations, ML Inference, and Video Transcoding | |
|   
P100 | 16 GB HBM2 @ 732 GBps | N/A | ML Training, Inference, HPC, Remote Visualization Workstations | 
To compare GPU pricing for the different GPU models and regions that are available on Compute Engine, see GPU pricing .
Performance comparison chart
The following table describes the performance specifications of different GPU models that are available on Compute Engine.
Compute performance
| GPU model | FP64 | FP32 | FP16 | INT8 | 
|---|---|---|---|---|
|   
GB200 | 90 TFLOPS | 180 TFLOPS | ||
|   
B200 | 40 TFLOPS | 80 TFLOPS | ||
|   
H200 | 34 TFLOPS | 67 TFLOPS | ||
|   
H100 | 34 TFLOPS | 67 TFLOPS | ||
|   
A100 80GB | 9.7 TFLOPS | 19.5 TFLOPS | ||
|   
A100 40GB | 9.7 TFLOPS | 19.5 TFLOPS | ||
|   
L4 | 0.5 TFLOPS 1 | 30.3 TFLOPS | ||
|   
T4 | 0.25 TFLOPS 1 | 8.1 TFLOPS | ||
|   
V100 | 7.8 TFLOPS | 15.7 TFLOPS | ||
|   
P4 | 0.2 TFLOPS 1 | 5.5 TFLOPS | 22 TOPS 2 | |
|   
P100 | 4.7 TFLOPS | 9.3 TFLOPS | 18.7 TFLOPS | 
 1 
To allow FP64 code to work correctly, the T4, L4, and P4 GPU
architecture includes a small number of FP64 hardware units.
 2 
TeraOperations per Second.
Tensor core performance
| GPU model | FP64 | TF32 | Mixed-precision FP16/FP32 | INT8 | INT4 | FP8 | 
|---|---|---|---|---|---|---|
|   
GB200 | 90 TFLOPS | 2,500 TFLOPS 2 | 5,000 TFLOPS 1, 2 | 10,000 TFLOPS 2 | 20,000 TFLOPS 2 | 10,000 TFLOPS 2 | 
|   
B200 | 40 TFLOPS | 1,100 TFLOPS 2 | 4,500 TFLOPS 1, 2 | 9,000 TFLOPS 2 | 9,000 TFLOPS 2 | |
|   
H200 | 67 TFLOPS | 989 TFLOPS 2 | 1,979 TFLOPS 1, 2 | 3,958 TOPS 2 | 3,958 TFLOPS 2 | |
|   
H100 | 67 TFLOPS | 989 TFLOPS 2 | 1,979 TFLOPS 1, 2 | 3,958 TOPS 2 | 3,958 TFLOPS 2 | |
|   
A100 80GB | 19.5 TFLOPS | 156 TFLOPS | 312 TFLOPS 1 | 624 TOPS | 1248 TOPS | |
|   
A100 40GB | 19.5 TFLOPS | 156 TFLOPS | 312 TFLOPS 1 | 624 TOPS | 1248 TOPS | |
|   
L4 | 120 TFLOPS 2 | 242 TFLOPS 1, 2 | 485 TOPS 2 | 485 TFLOPS 2 | ||
|   
T4 | 65 TFLOPS | 130 TOPS | 260 TOPS | |||
|   
V100 | 125 TFLOPS | |||||
|   
P4 | ||||||
|   
P100 | 
 1 
For mixed precision training, NVIDIA GB200, B200, H200, H100,
A100, and L4 GPUs also support the bfloat16 
data type.
 2 
NVIDIA GB200, B200, H200, H100, and L4 GPUs support structural sparsity 
. You can use structural sparsity to double the performance
of your models. The values that are documented apply when using structured sparsity.
If you aren't using structured sparsity, the values are halved.
What's next?
- Learn more about Compute Engine GPUs .
- Check GPU regions and zones availability .
- Review Network bandwidths and GPUs .
- View GPU pricing details .

