TPU v5e

This document describes the architecture and supported configurations of Cloud TPU v5e.

TPU v5e supports single and multi-host training and single-host inference. Multi-host inference is supported using Sax . For more information, see Cloud TPU inference .

System architecture

Each v5e chip contains one TensorCore. Each TensorCore has four matrix-multiply units (MXUs), a vector unit, and a scalar unit.

The following diagram illustrates a TPU v5e chip.

Diagram of a v5e chip

The following table shows the key chip specifications and their values for v5e.

Key chip specifications	v5e values
Peak compute per chip (bf16)	197 TFLOPs
HBM2 capacity and bandwidth	16 GB, 819 GBps
Bidirectional inter-chip interconnect (ICI) bandwidth (per chip)	400 GBps

The following table shows Pod specifications and their values for v5e.

Key Pod specifications	v5e values
TPU Pod size	256 chips
Interconnect topology	2D Torus
Peak compute per Pod	100 PetaOps (Int8)
All-reduce bandwidth per Pod	51.2 TBps
Bisection bandwidth per Pod	1.6 TBps
Data center network bandwidth per Pod	6.4 Tbps

Configurations

Cloud TPU v5e is a combined training and inference (serving) product. Training jobs are optimized for throughput and availability, while serving jobs are optimized for latency. A training job on TPUs provisioned for serving could have lower availability and similarly, a serving job executed on TPUs provisioned for training could have higher latency.

The following 2D slice shapes are supported for v5e:

Topology	Number of TPU chips	Number of hosts
1x1	1	1/8
2x2	4	1/2
2x4	8	1
4x4	16	2
4x8	32	4
8x8	64	8
8x16	128	16
16x16	256	32

VM types

Each TPU VM in a v5e TPU slice contains 1, 4 or 8 chips. In 4-chip and smaller slices, all TPU chips share the same non-uniform memory access (NUMA) node.

For 8-chip v5e TPU VMs, CPU-TPU communication will be more efficient within NUMA partitions. For example, in the following figure, CPU0-Chip0 communication will be faster than CPU0-Chip4 communication.

NUMA node communication

The following table shows a comparison of TPU v5e VM types:

VM type	Machine type (GKE API)	Number of vCPUs per VM	RAM (GB) per VM	Number of NUMA nodes per VM
1-chip VM	`ct5lp-hightpu-1t`	24	48	1
4-chip VM	`ct5lp-hightpu-4t`	112	192	1
8-chip VM	`ct5lp-hightpu-8t`	224	384	2

Cloud TPU v5e types for serving

Single-host serving is supported for up to 8 v5e chips. The following configurations are supported: 1x1, 2x2 and 2x4 slices. Each slice has 1, 4 and 8 chips respectively.

TPU v5e configurations that support serving: 1x1, 2x2, and
2x4.

To provision TPUs for a serving job, use one of the following TPU slice sizes in your CLI or API TPU creation request:

Number of TPU chips	Machine type (GKE API)
`1`	`ct5lp-hightpu-1t`
`4`	`ct5lp-hightpu-4t`
`8`	`ct5lp-hightpu-8t`

For more information about managing TPUs, see Manage TPUs . For more information about the system architecture of Cloud TPU, see System architecture .

Serving on more than 8 v5e chips, also called multi-host serving, is supported using Sax . For more information, see Cloud TPU inference .

Cloud TPU v5e types for training

Training is supported for up to 256 chips.

To provision TPUs for a v5e training job, use one of the following TPU slice sizes in your CLI or API TPU creation request:

Number of TPU chips	Machine type (GKE API)	Topology
`16`	`ct5lp-hightpu-4t`	4x4
`32`	`ct5lp-hightpu-4t`	4x8
`64`	`ct5lp-hightpu-4t`	8x8
`128`	`ct5lp-hightpu-4t`	8x16
`256`	`ct5lp-hightpu-4t`	16x16

For more information about managing TPUs, see Manage TPUs . For more information about the system architecture of Cloud TPU, see System architecture .