Slice is a custom resource definition that lets you define and manage accelerator slices in Google Kubernetes Engine (GKE). This resource lets you group TPU partitions to form larger slices for distributed training workloads.
When you create or update a Slice custom resource, a validating webhook is triggered to ensure that the resource specification is valid. This validation helps to ensure that only valid Slice definitions are accepted by the Kubernetes API server.
apiVersion
:
accelerator.gke.io/v1beta1
kind
:
Slice
metadata
:
name
:
test-slice
spec
:
type
:
"tpu7x"
topology
:
"4x4x8"
partitionIds
:
-
a9476d1b02bd4f4e75ffffae3bd23c01
-
ba898ffcac0ad0946e8ff036d771ee53
status
:
conditions
:
-
type
:
Ready
status
:
"False"
reason
:
FAILED
message
:
""
lastTransitionTime
:
"2026-01-11T23:45:38Z"
Slice specification
metadata
:
name
:
string
spec
:
type
:
string
topology
:
string
partitionIds
:
[]
string
metadata.name
required
string
The name of the Slice resource. It must adhere to the following rules to ensure compatibility with underlying Compute Engine resource naming conventions:
- Length : the name must be 49 characters or fewer. The controller appends a hyphen and an 8-character cluster hash to create Compute Engine resource names, which have a 63-character limit.
- Format
: it must match the following regular expression:
^[a-z]([-a-z0-9]*[a-z0-9])?$. This match means the name has the following characteristics:- Must start with a lowercase letter.
- Can only contain lowercase letters, numbers, and hyphens (-).
- Must end with a lowercase letter or a number (it cannot end with a hyphen).
metadata.annotations
optional
object
Annotations for the Slice resource.
slice.gke.io/retry-on-failure
: "true"
If set to "true"
, the slice controller will retry the creation of the slice if it fails.
type
required
string
The type of accelerator for this slice. The type
field must be one of the following supported values:
-
tpu7x
Any other value will be rejected.
topology
required
string
The topology for the accelerator slice. Requirements for the tpu7x
accelerator topology include the following rules:
- The topology must be a three-dimensional string in following format:
AxBxC. For example,4x8x8. - Each dimension (A, B, and C) must be a multiple of four.
- The dimensions must be sorted in non-decreasing order: A <= B <= C. For example,
4x8x4is invalid; it should be4x4x8. - The product of the dimensions (A*B*C) must not exceed 9,216.
- The number of partitions is calculated based on the total number of chips, where each partition consists of 64 chips. The number of items in your
spec.partitionIdslist must exactly match the calculated number of partitions ((A*B*C) / 64).
partitionIds
required
string[]
A list of strings that identify the partitions making up the slice. All values within the partitionIds
list must be unique. Duplicate partition IDs are not allowed.
Slice status
conditions
:
-
type
:
string
status
:
string
reason
:
string
message
:
string
lastTransitionTime
:
string
conditions[]
object
List of status conditions for the Slice
.
conditions.type
string
Condition type.
conditions.status
string
Condition status. Possible values are True
or False
.
conditions.reason
string
A short description of the lifecycle state of the slice. Possible values the following:
-
SliceNotCreated: the slice has not been created yet. The slice controller is initializing and performing preflight checks. -
SliceCreationFailed: creation failed because prerequisites were not met (for example, required Compute Engine resources are missing) or preflight checks failed. -
ACTIVATING: the slice is being formed (stitched). -
ACTIVE: the slice is fully formed, healthy, and ready to execute workloads. -
ACTIVE_DEGRADED: the slice is formed but includes degraded sub-blocks. Workloads can run, but performance might be impacted. The slice is resilient to single OCS failures. -
DEACTIVATING: the slice is being dismantled. -
FAILED: the slice is no longer ready to execute workloads. This state occurs if initial formation failed or if an active slice experienced a critical software or hardware failure. -
INCOMPLETE: transition step between slice formation and deformation process.
conditions.message
string
A detailed description of the condition's status.
conditions.lastTransitionTime
string
A timestamp of the most recent status change.

