MachineSpec

Specification of a single machine.

Fields

machineType string

Immutable. The type of the machine.

See the list of machine types supported for prediction

See the list of machine types supported for custom training .

For DeployedModel this field is optional, and the default value is n1-standard-2 . For BatchPredictionJob or as part of WorkerPoolSpec this field is required.

acceleratorType

enum (  AcceleratorType 
 
)

Immutable. The type of accelerator(s) that may be attached to the machine as per acceleratorCount .

acceleratorCount integer

The number of accelerators to attach to the machine.

For accelerator optimized machine types ( https://cloud.google.com/compute/docs/accelerator-optimized-machines) , One may set the acceleratorCount from 1 to N for machine with N GPUs. If acceleratorCount is less than or equal to N / 2, Vertex will co-schedule the replicas of the model into the same VM to save cost.

For example, if the machine type is a3-highgpu-8g, which has 8 H100 GPUs, one can set acceleratorCount to 1 to 8. If acceleratorCount is 1, 2, 3, or 4, Vertex will co-schedule 8, 4, 2, or 2 replicas of the model into the same VM to save cost.

When co-scheduling, CPU, memory and storage on the VM will be distributed to replicas on the VM. For example, one can expect a co-scheduled replica requesting 2 GPUs out of a 8-GPU VM will receive 25% of the CPU, memory and storage of the VM.

Note that the feature is not compatible with [multihostGpuNodeCount][]. When multihostGpuNodeCount is set, the co-scheduling will not be enabled.

gpuPartitionSize string

Optional. Immutable. The Nvidia GPU partition size.

When specified, the requested accelerators will be partitioned into smaller GPU partitions. For example, if the request is for 8 units of NVIDIA A100 GPUs, and gpuPartitionSize="1g.10gb", the service will create 8 * 7 = 56 partitioned MIG instances.

The partition size must be a value supported by the requested accelerator. Refer to Nvidia GPU Partitioning for the available partition sizes.

If set, the acceleratorCount should be set to 1.

tpuTopology string

Immutable. The topology of the TPUs. Corresponds to the TPU topologies available from GKE. (Example: tpuTopology: "2x2x1").

reservationAffinity

object (  ReservationAffinity 
 
)

Optional. Immutable. Configuration controlling how this resource pool consumes reservation.

JSON representation

JSON representation
{ "machineType" : string , "acceleratorType" : enum ( `AcceleratorType` ) , "acceleratorCount" : integer , "gpuPartitionSize" : string , "tpuTopology" : string , "reservationAffinity" : { object ( `ReservationAffinity` ) } }

 { 
 "machineType" 
 : 
 string 
 , 
 "acceleratorType" 
 : 
 enum (  AcceleratorType 
 
) 
 , 
 "acceleratorCount" 
 : 
 integer 
 , 
 "gpuPartitionSize" 
 : 
 string 
 , 
 "tpuTopology" 
 : 
 string 
 , 
 "reservationAffinity" 
 : 
 { 
 object (  ReservationAffinity 
 
) 
 } 
 }

AcceleratorType

Represents a hardware accelerator type.

Enums
`ACCELERATOR_TYPE_UNSPECIFIED`	Unspecified accelerator type, which means no accelerator.
`NVIDIA_TESLA_K80`	Deprecated: Nvidia Tesla K80 GPU has reached end of support, see https://cloud.google.com/compute/docs/eol/k80-eol . This item is deprecated!
`NVIDIA_TESLA_P100`	Nvidia Tesla P100 GPU.
`NVIDIA_TESLA_V100`	Nvidia Tesla V100 GPU.
`NVIDIA_TESLA_P4`	Nvidia Tesla P4 GPU.
`NVIDIA_TESLA_T4`	Nvidia Tesla T4 GPU.
`NVIDIA_TESLA_A100`	Nvidia Tesla A100 GPU.
`NVIDIA_A100_80GB`	Nvidia A100 80GB GPU.
`NVIDIA_L4`	Nvidia L4 GPU.
`NVIDIA_H100_80GB`	Nvidia H100 80Gb GPU.
`NVIDIA_H100_MEGA_80GB`	Nvidia H100 Mega 80Gb GPU.
`NVIDIA_H200_141GB`	Nvidia H200 141Gb GPU.
`NVIDIA_B200`	Nvidia B200 GPU.
`NVIDIA_GB200`	Nvidia GB200 GPU.
`NVIDIA_RTX_PRO_6000`	Nvidia RTX Pro 6000 GPU.
`TPU_V2`	TPU v2.
`TPU_V3`	TPU v3.
`TPU_V4_POD`	TPU v4.
`TPU_V5_LITEPOD`	TPU v5.

ReservationAffinity

A ReservationAffinity can be used to configure a Vertex AI resource (e.g., a DeployedModel) to draw its Compute Engine resources from a Shared Reservation, or exclusively from on-demand capacity.

Fields

reservationAffinityType

enum (  Type 
 
)

Required. Specifies the reservation affinity type.

key string

Optional. Corresponds to the label key of a reservation resource. To target a SPECIFIC_RESERVATION by name, use compute.googleapis.com/reservation-name as the key and specify the name of your reservation as its value.

values[] string

Optional. Corresponds to the label values of a reservation resource. This must be the full resource name of the reservation or reservation block.

JSON representation
{ "reservationAffinityType" : enum ( `Type` ) , "key" : string , "values" : [ string ] }

Type

Identifies a type of reservation affinity.

Enums
`TYPE_UNSPECIFIED`	Default value. This should not be used.
`NO_RESERVATION`	Do not consume from any reserved capacity, only use on-demand.
`ANY_RESERVATION`	Consume any reservation available, falling back to on-demand.
`SPECIFIC_RESERVATION`	Consume from a specific reservation. When chosen, the reservation must be identified via the `key` and `values` fields.

MachineSpec Stay organized with collections Save and categorize content based on your preferences.

AcceleratorType

ReservationAffinity

Type

MachineSpec