Hello custom training: Train a custom image classification model

This page shows you how to run a TensorFlow Keras training application on Vertex AI. This particular model trains an image classification model that can classify flowers by type.

This tutorial has several pages:

Setting up your project and environment.
Training a custom image classification model.
Serving predictions from a custom image classification model.
Cleaning up your project.

Each page assumes that you have already performed the instructions from the previous pages of the tutorial.

The rest of this document assumes that you are using the same Cloud Shell environment that you created when following the first page of this tutorial . If your original Cloud Shell session is no longer open, you can return to the environment by doing the following:

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell
In the Cloud Shell session, run the following command:
```
 cd 
  
hello-custom-sample
```

Run a custom training pipeline

This section describes using the training package that you uploaded to Cloud Storage to run a Vertex AI custom training pipeline.

In the Google Cloud console, in the Vertex AI section, go to the Training pipelinespage.

Go to Training pipelines
Click Createto open the Train new modelpane.
On the Choose training methodstep, do the following:
1. In the Datasetdrop-down list, select No managed dataset. This particular training application loads data from the TensorFlow Datasets library rather than a managed Vertex AI dataset.
2. Ensure that Custom training (advanced)is selected.
Click Continue.
On the Model detailsstep, in the Namefield, enter hello_custom . Click Continue.
On the Training containerstep, provide Vertex AI with information it needs to use the training package that you uploaded to Cloud Storage:
1. Select Prebuilt container.
2. In the Model frameworkdrop-down list, select TensorFlow.
3. In the Model framework versiondrop-down list, select 2.3.
4. In the Package locationfield, enter cloud-samples-data/ai-platform/hello-custom/hello-custom-sample-v1.tar.gz .
5. In the Python modulefield, enter trainer.task . trainer is the name of the Python package in your tarball, and task.py contains your training code. Therefore, trainer.task is the name of the module that you want Vertex AI to run.
6. In the Model output directoryfield, click Browse. Do the following in the Select folderpane:
  1. Navigate to your Cloud Storage bucket.
  2. Click Create new folder .
  3. Name the new folder output . Then click Create.
  4. Click Select.
  Confirm that field has the value gs:// BUCKET_NAME /output , where BUCKET_NAME is the name of your Cloud Storage bucket.
  
  This value gets passed to Vertex AI in the baseOutputDirectory API field , which sets several environment variables that your training application can access when it runs.
  
  For example, when you set this field to gs:// BUCKET_NAME /output , Vertex AI sets the AIP_MODEL_DIR environment variable to gs:// BUCKET_NAME /output/model . At the end of training, Vertex AI uses any artifacts in the AIP_MODEL_DIR directory to create a model resource.
  
  Learn more about the environment variables set by this field .
Click Continue.
On the optional Hyperparametersstep, make sure that the Enable hyperparameter tuningcheckbox is cleared. This tutorial does not use hyperparameter tuning. Click Continue.
On the Compute and pricingstep, allocate resources for the custom training job:
1. In the Regiondrop-down list, select us-central1 (Iowa).
2. In the Machine typedrop-down list, select n1-standard-4from the Standardsection.
Do not add any accelerators or worker pools for this tutorial. Click Continue.
On the Prediction containerstep, provide Vertex AI with information it needs to serve predictions:
1. Select Prebuilt container.
2. In the Prebuilt container settingssection, do the following:
  1. In the Model frameworkdrop-down list, select TensorFlow.
  2. In the Model framework versiondrop-down list, select 2.3.
  3. In the Accelerator typedrop-down list, select None.
  4. Confirm that Model directoryfield has the value gs:// BUCKET_NAME /output , where BUCKET_NAME is the name of your Cloud Storage bucket. This matches the Model output directoryvalue that you provided in a previous step.
3. Leave the fields in the Predict schematasection blank.
Click Start trainingto start the custom training pipeline.

You can now view your new training pipeline , which is named hello_custom , on the Trainingpage. (You might need to refresh the page.) The training pipeline does two main things:

The training pipeline creates a custom job resource named hello_custom-custom-job . After a few moments, you can view this resource on the Custom jobspage of the Trainingsection:

Go to Custom jobs

The custom job runs the training application using the computing resources that you specified in this section.
After the custom job completes, the training pipeline finds the artifacts that your training application creates in the output/model/ directory of your Cloud Storage bucket. It uses these artifacts to create a model resource.

Monitor training

To view training logs, do the following:

In the Google Cloud console, in the Vertex AI section, go to the Custom jobspage.

Go to Custom jobs
To view details for the CustomJob that you just created, click hello_custom-custom-job in the list.
On the job details page, click View logs.

View your trained model

When the custom training pipeline completes, you can find the trained model in the Google Cloud console, in the Vertex AI section, on the Modelspage.

Go to Models

The model has the name hello_custom .

What's next

Follow the next page of this tutorial to serve predictions from your trained ML model.

Set up your project and environment

Serve predictions from a custom image classification model