Configure container settings for Vertex AI serverless training

When you perform Vertex AI serverless training, you must specify what machine learning (ML) code you want Vertex AI to run. To do this, configure training container settings for either a custom container or a Python training application that runs on a prebuilt container .

To determine whether you want to use a custom container or a prebuilt container, read Training code requirements .

This document describes the fields of the Vertex AI API that you must specify in either of the preceding cases.

Where to specify container settings

Specify configuration details within a WorkerPoolSpec . Depending on how you perform serverless training, put this WorkerPoolSpec in one of the following API fields:

If you are creating a CustomJob resource ,specify the WorkerPoolSpec in CustomJob.jobSpec.workerPoolSpecs .

If you are using the Google Cloud CLI, then you can use the --worker-pool-spec flag or the --config flag on the gcloud ai custom-jobs create command to specify worker pool options.

Learn more about creating a CustomJob .
If you are creating a HyperparameterTuningJob resource ,specify the WorkerPoolSpec in HyperparameterTuningJob.trialJobSpec.workerPoolSpecs .

If you are using the gcloud CLI, then you can use the --config flag on the gcloud ai hpt-tuning-jobs create command to specify worker pool options.

Learn more about creating a HyperparameterTuningJob .
If you are creating a TrainingPipeline resource without hyperparameter tuning,specify the WorkerPoolSpec in TrainingPipeline.trainingTaskInputs.workerPoolSpecs .

Learn more about creating a custom TrainingPipeline .
If you are creating a TrainingPipeline with hyperparameter tuning, specify the WorkerPoolSpec in TrainingPipeline.trainingTaskInputs.trialJobSpec.workerPoolSpecs .

If you are performing distributed training , you can use different settings for each worker pool.

Configure container settings

Depending on whether you are using a prebuilt container or a custom container, you must specify different fields within the WorkerPoolSpec . Select the tab for your scenario:

Prebuilt container

Select a prebuilt container that supports the ML framework you plan to use for training. Specify one of the container image's URIs in the pythonPackageSpec.executorImageUri field .
Specify the Cloud Storage URIs of your Python training application in the pythonPackageSpec.packageUris field .
Specify your training application's entry point module in the pythonPackageSpec.pythonModule field .
Optionally, specify a list of command-line arguments to pass to your training application's entry point module in the pythonPackageSpec.args field .

The following examples highlight where you specify these container settings when you create a CustomJob :

Console

In the Google Cloud console, you can't create a CustomJob directly. However, you can create a TrainingPipeline that creates a CustomJob . When you create a TrainingPipeline in the Google Cloud console, you can specify prebuilt container settings in certain fields on the Training containerstep:

pythonPackageSpec.executorImageUri : Use the Model frameworkand Model framework versiondrop-down lists.
pythonPackageSpec.packageUris : Use the Package locationfield.
pythonPackageSpec.pythonModule : Use the Python modulefield.
pythonPackageSpec.args : Use the Argumentsfield.

gcloud

 gcloud  
ai  
custom-jobs  
create  
 \ 
  
--region = 
 LOCATION 
  
 \ 
  
--display-name = 
 JOB_NAME 
  
 \ 
  
 --python-package-uris = 
 PYTHON_PACKAGE_URIS 
  
 \ 
  
--worker-pool-spec = 
machine-type = 
 MACHINE_TYPE 
,replica-count = 
 REPLICA_COUNT 
, executor-image-uri = 
 PYTHON_PACKAGE_EXECUTOR_IMAGE_URI 
,python-module = 
 PYTHON_MODULE

For more context, read the guide to creating a CustomJob .

Custom container

Specify the Artifact Registry or Docker Hub URI of your custom container in the containerSpec.imageUri field .
Optionally, if you want to override the ENTRYPOINT or CMD instructions in your container, specify the containerSpec.command or containerSpec.args fields . These fields affect how your container runs according to the following rules:
- If you specify neither field:Your container runs according to its ENTRYPOINT instruction and CMD instruction (if it exists). Refer to the Docker documentation about how CMD and ENTRYPOINT interact .
- If you specify only containerSpec.command :Your container runs with the value of containerSpec.command replacing its ENTRYPOINT instruction. If the container has a CMD instruction, it is ignored.
- If you specify only containerSpec.args :Your container runs according to its ENTRYPOINT instruction, with the value of containerSpec.args replacing its CMD instruction.
- If you specify both fields:Your container runs with containerSpec.command replacing its ENTRYPOINT instruction and containerSpec.args replacing its CMD instruction.

The following example highlights where you can specify some of these container settings when you create a CustomJob :

Console

In the Google Cloud console, you can't create a CustomJob directly. However, you can create a TrainingPipeline that creates a CustomJob . When you create a TrainingPipeline in the Google Cloud console, you can specify custom container settings in certain fields on the Training containerstep:

containerSpec.imageUri : Use the Container imagefield.
containerSpec.command : This API field is not configurable in the Google Cloud console.
containerSpec.args : Use the Argumentsfield.

gcloud

 gcloud  
ai  
custom-jobs  
create  
 \ 
  
--region = 
 LOCATION 
  
 \ 
  
--display-name = 
 JOB_NAME 
  
 \ 
  
--worker-pool-spec = 
machine-type = 
 MACHINE_TYPE 
,replica-count = 
 REPLICA_COUNT 
, container-image-uri = 
 CUSTOM_CONTAINER_IMAGE_URI

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Java API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  import 
  
 com.google.cloud.aiplatform.v1. AcceleratorType 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. ContainerSpec 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. CustomJob 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. CustomJobSpec 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. JobServiceClient 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. JobServiceSettings 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. LocationName 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. MachineSpec 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. WorkerPoolSpec 
 
 ; 
 import 
  
 java.io.IOException 
 ; 
 // Create a custom job to run machine learning training code in Vertex AI 
 public 
  
 class 
 CreateCustomJobSample 
  
 { 
  
 public 
  
 static 
  
 void 
  
 main 
 ( 
 String 
 [] 
  
 args 
 ) 
  
 throws 
  
 IOException 
  
 { 
  
 // TODO(developer): Replace these variables before running the sample. 
  
 String 
  
 project 
  
 = 
  
 "PROJECT" 
 ; 
  
 String 
  
 displayName 
  
 = 
  
 "DISPLAY_NAME" 
 ; 
  
 // Vertex AI runs your training application in a Docker container image. A Docker container 
  
 // image is a self-contained software package that includes code and all dependencies. Learn 
  
 // more about preparing your training application at 
  
 // https://cloud.google.com/vertex-ai/docs/training/overview#prepare_your_training_application 
  
 String 
  
 containerImageUri 
  
 = 
  
 "CONTAINER_IMAGE_URI" 
 ; 
  
 createCustomJobSample 
 ( 
 project 
 , 
  
 displayName 
 , 
  
 containerImageUri 
 ); 
  
 } 
  
 static 
  
 void 
  
 createCustomJobSample 
 ( 
 String 
  
 project 
 , 
  
 String 
  
 displayName 
 , 
  
 String 
  
 containerImageUri 
 ) 
  
 throws 
  
 IOException 
  
 { 
  
  JobServiceSettings 
 
  
 settings 
  
 = 
  
  JobServiceSettings 
 
 . 
 newBuilder 
 () 
  
 . 
 setEndpoint 
 ( 
 "us-central1-aiplatform.googleapis.com:443" 
 ) 
  
 . 
 build 
 (); 
  
 String 
  
 location 
  
 = 
  
 "us-central1" 
 ; 
  
 // Initialize client that will be used to send requests. This client only needs to be created 
  
 // once, and can be reused for multiple requests. 
  
 try 
  
 ( 
  JobServiceClient 
 
  
 client 
  
 = 
  
  JobServiceClient 
 
 . 
 create 
 ( 
 settings 
 )) 
  
 { 
  
  MachineSpec 
 
  
 machineSpec 
  
 = 
  
  MachineSpec 
 
 . 
 newBuilder 
 () 
  
 . 
  setMachineType 
 
 ( 
 "n1-standard-4" 
 ) 
  
 . 
  setAcceleratorType 
 
 ( 
  AcceleratorType 
 
 . 
 NVIDIA_TESLA_T4 
 ) 
  
 . 
  setAcceleratorCount 
 
 ( 
 1 
 ) 
  
 . 
 build 
 (); 
   
  ContainerSpec 
 
  
 containerSpec 
  
 = 
   
  ContainerSpec 
 
 . 
 newBuilder 
 (). 
 setImageUri 
 ( 
 containerImageUri 
 ). 
 build 
 (); 
  
  WorkerPoolSpec 
 
  
 workerPoolSpec 
  
 = 
  
  WorkerPoolSpec 
 
 . 
 newBuilder 
 () 
  
 . 
 setMachineSpec 
 ( 
 machineSpec 
 ) 
  
 . 
 setReplicaCount 
 ( 
 1 
 ) 
  
 . 
 setContainerSpec 
 ( 
 containerSpec 
 ) 
  
 . 
 build 
 (); 
  
  CustomJobSpec 
 
  
 customJobSpecJobSpec 
  
 = 
  
  CustomJobSpec 
 
 . 
 newBuilder 
 (). 
  addWorkerPoolSpecs 
 
 ( 
 workerPoolSpec 
 ). 
 build 
 (); 
  
  CustomJob 
 
  
 customJob 
  
 = 
  
  CustomJob 
 
 . 
 newBuilder 
 () 
  
 . 
 setDisplayName 
 ( 
 displayName 
 ) 
  
 . 
  setJobSpec 
 
 ( 
 customJobSpecJobSpec 
 ) 
  
 . 
 build 
 (); 
  
  LocationName 
 
  
 parent 
  
 = 
  
  LocationName 
 
 . 
 of 
 ( 
 project 
 , 
  
 location 
 ); 
  
  CustomJob 
 
  
 response 
  
 = 
  
 client 
 . 
 createCustomJob 
 ( 
 parent 
 , 
  
 customJob 
 ); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "response: %s\n" 
 , 
  
 response 
 ); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "Name: %s\n" 
 , 
  
 response 
 . 
  getName 
 
 ()); 
  
 } 
  
 } 
 }

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Node.js API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  /** 
 * TODO(developer): Uncomment these variables before running the sample.\ 
 * (Not necessary if passing values as arguments) 
 */ 
 // const customJobDisplayName = 'YOUR_CUSTOM_JOB_DISPLAY_NAME'; 
 // const containerImageUri = 'YOUR_CONTAINER_IMAGE_URI'; 
 // const project = 'YOUR_PROJECT_ID'; 
 // const location = 'YOUR_PROJECT_LOCATION'; 
 // Imports the Google Cloud Job Service Client library 
 const 
  
 { 
 JobServiceClient 
 } 
  
 = 
  
 require 
 ( 
 ' @google-cloud/aiplatform 
' 
 ); 
 // Specifies the location of the api endpoint 
 const 
  
 clientOptions 
  
 = 
  
 { 
  
 apiEndpoint 
 : 
  
 'us-central1-aiplatform.googleapis.com' 
 , 
 }; 
 // Instantiates a client 
 const 
  
 jobServiceClient 
  
 = 
  
 new 
  
  JobServiceClient 
 
 ( 
 clientOptions 
 ); 
 async 
  
 function 
  
 createCustomJob 
 () 
  
 { 
  
 // Configure the parent resource 
  
 const 
  
 parent 
  
 = 
  
 `projects/ 
 ${ 
 project 
 } 
 /locations/ 
 ${ 
 location 
 } 
 ` 
 ; 
  
 const 
  
 customJob 
  
 = 
  
 { 
  
 displayName 
 : 
  
 customJobDisplayName 
 , 
  
 jobSpec 
 : 
  
 { 
  
 workerPoolSpecs 
 : 
  
 [ 
  
 { 
  
 machineSpec 
 : 
  
 { 
  
 machineType 
 : 
  
 'n1-standard-4' 
 , 
  
 acceleratorType 
 : 
  
 'NVIDIA_TESLA_T4' 
 , 
  
 acceleratorCount 
 : 
  
 1 
 , 
  
 }, 
  
 replicaCount 
 : 
  
 1 
 , 
   
 containerSpec 
 : 
  
 { 
   
 imageUri 
 : 
  
 containerImageUri 
 , 
   
 command 
 : 
  
 [], 
   
 args 
 : 
  
 [], 
   
 }, 
  
 }, 
  
 ], 
  
 }, 
  
 }; 
  
 const 
  
 request 
  
 = 
  
 { 
 parent 
 , 
  
 customJob 
 }; 
  
 // Create custom job request 
  
 const 
  
 [ 
 response 
 ] 
  
 = 
  
 await 
  
 jobServiceClient 
 . 
 createCustomJob 
 ( 
 request 
 ); 
  
 console 
 . 
 log 
 ( 
 'Create custom job response:\n' 
 , 
  
 JSON 
 . 
 stringify 
 ( 
 response 
 )); 
 } 
 createCustomJob 
 ();

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python . For more information, see the Python API reference documentation .

  from 
  
 google.cloud 
  
 import 
 aiplatform 
 def 
  
 create_custom_job_sample 
 ( 
 project 
 : 
 str 
 , 
 display_name 
 : 
 str 
 , 
 container_image_uri 
 : 
 str 
 , 
 location 
 : 
 str 
 = 
 "us-central1" 
 , 
 api_endpoint 
 : 
 str 
 = 
 "us-central1-aiplatform.googleapis.com" 
 , 
 ): 
 # The AI Platform services require regional API endpoints. 
 client_options 
 = 
 { 
 "api_endpoint" 
 : 
 api_endpoint 
 } 
 # Initialize client that will be used to create and send requests. 
 # This client only needs to be created once, and can be reused for multiple requests. 
 client 
 = 
 aiplatform 
 . 
 gapic 
 . 
 JobServiceClient 
 ( 
 client_options 
 = 
 client_options 
 ) 
 custom_job 
 = 
 { 
 "display_name" 
 : 
 display_name 
 , 
 "job_spec" 
 : 
 { 
 "worker_pool_specs" 
 : 
 [ 
 { 
 "machine_spec" 
 : 
 { 
 "machine_type" 
 : 
 "n1-standard-4" 
 , 
 "accelerator_type" 
 : 
 aiplatform 
 . 
 gapic 
 . 
 AcceleratorType 
 . 
 NVIDIA_TESLA_K80 
 , 
 "accelerator_count" 
 : 
 1 
 , 
 }, 
 "replica_count" 
 : 
 1 
 , 
  "container_spec" 
 : 
 { 
  "image_uri" 
 : 
 container_image_uri 
 , 
  "command" 
 : 
 [], 
  "args" 
 : 
 [], 
  }, 
 } 
 ] 
 }, 
 } 
 parent 
 = 
 f 
 "projects/ 
 { 
 project 
 } 
 /locations/ 
 { 
 location 
 } 
 " 
 response 
 = 
 client 
 . 
 create_custom_job 
 ( 
 parent 
 = 
 parent 
 , 
 custom_job 
 = 
 custom_job 
 ) 
 print 
 ( 
 "response:" 
 , 
 response 
 )

For more context, read the guide to creating a CustomJob .

What's next

Learn how to perform serverless training by creating a CustomJob .

Configure container settings for Vertex AI serverless training Stay organized with collections Save and categorize content based on your preferences.

Where to specify container settings

Configure container settings

Prebuilt container

Console

gcloud

Custom container

Console

gcloud

Java

Node.js

Python

What's next

Configure container settings for Vertex AI serverless training