This page shows you how to run a TensorFlow Keras training application on Vertex AI. This particular model trains an image classification model that can classify flowers by type.
This tutorial has several pages:-
Training a custom image classification model.
-
Serving predictions from a custom image classification model.
Each page assumes that you have already performed the instructions from the previous pages of the tutorial.
The rest of this document assumes that you are using the same Cloud Shell environment that you created when following the first page of this tutorial . If your original Cloud Shell session is no longer open, you can return to the environment by doing the following:-
In the Google Cloud console, activate Cloud Shell.
-
In the Cloud Shell session, run the following command:
cd hello-custom-sample
Run a custom training pipeline
This section describes using the training package that you uploaded to Cloud Storage to run a Vertex AI custom training pipeline.
-
In the Google Cloud console, in the Vertex AI section, go to the Training pipelinespage.
-
Click Createto open the Train new modelpane.
-
On the Choose training methodstep, do the following:
-
In the Datasetdrop-down list, select No managed dataset. This particular training application loads data from the TensorFlow Datasets library rather than a managed Vertex AI dataset.
-
Ensure that Custom training (advanced)is selected.
Click Continue.
-
-
On the Model detailsstep, in the Namefield, enter
hello_custom. Click Continue. -
On the Training containerstep, provide Vertex AI with information it needs to use the training package that you uploaded to Cloud Storage:
-
Select Prebuilt container.
-
In the Model frameworkdrop-down list, select TensorFlow.
-
In the Model framework versiondrop-down list, select 2.3.
-
In the Package locationfield, enter
cloud-samples-data/ai-platform/hello-custom/hello-custom-sample-v1.tar.gz. -
In the Python modulefield, enter
trainer.task.traineris the name of the Python package in your tarball, andtask.pycontains your training code. Therefore,trainer.taskis the name of the module that you want Vertex AI to run. -
In the Model output directoryfield, click Browse. Do the following in the Select folderpane:
-
Navigate to your Cloud Storage bucket.
-
Click Create new folder .
-
Name the new folder
output. Then click Create. -
Click Select.
Confirm that field has the value
gs:// BUCKET_NAME /output, where BUCKET_NAME is the name of your Cloud Storage bucket.This value gets passed to Vertex AI in the
baseOutputDirectoryAPI field , which sets several environment variables that your training application can access when it runs.For example, when you set this field to
gs:// BUCKET_NAME /output, Vertex AI sets theAIP_MODEL_DIRenvironment variable togs:// BUCKET_NAME /output/model. At the end of training, Vertex AI uses any artifacts in theAIP_MODEL_DIRdirectory to create a model resource.Learn more about the environment variables set by this field .
-
Click Continue.
-
-
On the optional Hyperparametersstep, make sure that the Enable hyperparameter tuningcheckbox is cleared. This tutorial does not use hyperparameter tuning. Click Continue.
-
On the Compute and pricingstep, allocate resources for the custom training job:
-
In the Regiondrop-down list, select us-central1 (Iowa).
-
In the Machine typedrop-down list, select n1-standard-4from the Standardsection.
Do not add any accelerators or worker pools for this tutorial. Click Continue.
-
-
On the Prediction containerstep, provide Vertex AI with information it needs to serve predictions:
-
Select Prebuilt container.
-
In the Prebuilt container settingssection, do the following:
-
In the Model frameworkdrop-down list, select TensorFlow.
-
In the Model framework versiondrop-down list, select 2.3.
-
In the Accelerator typedrop-down list, select None.
-
Confirm that Model directoryfield has the value
gs:// BUCKET_NAME /output, where BUCKET_NAME is the name of your Cloud Storage bucket. This matches the Model output directoryvalue that you provided in a previous step.
-
-
Leave the fields in the Predict schematasection blank.
-
-
Click Start trainingto start the custom training pipeline.
You can now view your new training pipeline
, which is named hello_custom
, on
the Trainingpage. (You might need to refresh the page.) The training
pipeline does two main things:
-
The training pipeline creates a custom job resource named
hello_custom-custom-job. After a few moments, you can view this resource on the Custom jobspage of the Trainingsection:The custom job runs the training application using the computing resources that you specified in this section.
-
After the custom job completes, the training pipeline finds the artifacts that your training application creates in the
output/model/directory of your Cloud Storage bucket. It uses these artifacts to create a model resource.
Monitor training
To view training logs, do the following:
-
In the Google Cloud console, in the Vertex AI section, go to the Custom jobspage.
-
To view details for the
CustomJobthat you just created, clickhello_custom-custom-jobin the list. -
On the job details page, click View logs.
View your trained model
When the custom training pipeline completes, you can find the trained model in the Google Cloud console, in the Vertex AI section, on the Modelspage.
The model has the name hello_custom
.
What's next
Follow the next page of this tutorial to serve predictions from your trained ML model.

