run
method to run the script. In this topic, you create the training script, then specify command arguments for your training script.
Create a training script
In this section, you create a training script. This script is a new file in your
notebook environment named task.py
. Later in this tutorial, you pass this
script to the aiplatform.CustomTrainingJob
constructor. When the script runs, it does the following:
-
Loads the data in the BigQuery dataset you created.
-
Uses the TensorFlow Keras API to build, compile, and train your model.
-
Specifies the number of epochs and the batch size to use when the Keras
Model.fit
method is invoked. -
Specifies where to save model artifacts using the
AIP_MODEL_DIR
environment variable.AIP_MODEL_DIR
is set by Vertex AI and contains the URI of a directory for saving model artifacts. For more information, see Environment variables for special Cloud Storage directories . -
Exports a TensorFlow
SavedModel
to the model directory. For more information, see Using theSavedModel
format on the TensorFlow website.
To create your training script, run the following code in your notebook:
%%
writefile
task
.
py
import
argparse
import
numpy
as
np
import
os
import
pandas
as
pd
import
tensorflow
as
tf
from
google.cloud
import
bigquery
from
google.cloud
import
storage
# Read environmental variables
training_data_uri
=
os
.
getenv
(
"AIP_TRAINING_DATA_URI"
)
validation_data_uri
=
os
.
getenv
(
"AIP_VALIDATION_DATA_URI"
)
test_data_uri
=
os
.
getenv
(
"AIP_TEST_DATA_URI"
)
# Read args
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
'--label_column'
,
required
=
True
,
type
=
str
)
parser
.
add_argument
(
'--epochs'
,
default
=
10
,
type
=
int
)
parser
.
add_argument
(
'--batch_size'
,
default
=
10
,
type
=
int
)
args
=
parser
.
parse_args
()
# Set up training variables
LABEL_COLUMN
=
args
.
label_column
# See https://cloud.google.com/vertex-ai/docs/workbench/managed/executor#explicit-project-selection for issues regarding permissions.
PROJECT_NUMBER
=
os
.
environ
[
"CLOUD_ML_PROJECT_ID"
]
bq_client
=
bigquery
.
Client
(
project
=
PROJECT_NUMBER
)
# Download a table
def
download_table
(
bq_table_uri
:
str
):
# Remove bq:// prefix if present
prefix
=
"bq://"
if
bq_table_uri
.
startswith
(
prefix
):
bq_table_uri
=
bq_table_uri
[
len
(
prefix
)
:]
# Download the BigQuery table as a dataframe
# This requires the "BigQuery Read Session User" role on the custom training service account.
table
=
bq_client
.
get_table
(
bq_table_uri
)
return
bq_client
.
list_rows
(
table
)
.
to_dataframe
()
# Download dataset splits
df_train
=
download_table
(
training_data_uri
)
df_validation
=
download_table
(
validation_data_uri
)
df_test
=
download_table
(
test_data_uri
)
def
convert_dataframe_to_dataset
(
df_train
:
pd
.
DataFrame
,
df_validation
:
pd
.
DataFrame
,
):
df_train_x
,
df_train_y
=
df_train
,
df_train
.
pop
(
LABEL_COLUMN
)
df_validation_x
,
df_validation_y
=
df_validation
,
df_validation
.
pop
(
LABEL_COLUMN
)
y_train
=
tf
.
convert_to_tensor
(
np
.
asarray
(
df_train_y
)
.
astype
(
"float32"
))
y_validation
=
tf
.
convert_to_tensor
(
np
.
asarray
(
df_validation_y
)
.
astype
(
"float32"
))
# Convert to numpy representation
x_train
=
tf
.
convert_to_tensor
(
np
.
asarray
(
df_train_x
)
.
astype
(
"float32"
))
x_test
=
tf
.
convert_to_tensor
(
np
.
asarray
(
df_validation_x
)
.
astype
(
"float32"
))
# Convert to one-hot representation
num_species
=
len
(
df_train_y
.
unique
())
y_train
=
tf
.
keras
.
utils
.
to_categorical
(
y_train
,
num_classes
=
num_species
)
y_validation
=
tf
.
keras
.
utils
.
to_categorical
(
y_validation
,
num_classes
=
num_species
)
dataset_train
=
tf
.
data
.
Dataset
.
from_tensor_slices
((
x_train
,
y_train
))
dataset_validation
=
tf
.
data
.
Dataset
.
from_tensor_slices
((
x_test
,
y_validation
))
return
(
dataset_train
,
dataset_validation
)
# Create datasets
dataset_train
,
dataset_validation
=
convert_dataframe_to_dataset
(
df_train
,
df_validation
)
# Shuffle train set
dataset_train
=
dataset_train
.
shuffle
(
len
(
df_train
))
def
create_model
(
num_features
):
# Create model
Dense
=
tf
.
keras
.
layers
.
Dense
model
=
tf
.
keras
.
Sequential
(
[
Dense
(
100
,
activation
=
tf
.
nn
.
relu
,
kernel_initializer
=
"uniform"
,
input_dim
=
num_features
,
),
Dense
(
75
,
activation
=
tf
.
nn
.
relu
),
Dense
(
50
,
activation
=
tf
.
nn
.
relu
),
Dense
(
25
,
activation
=
tf
.
nn
.
relu
),
Dense
(
3
,
activation
=
tf
.
nn
.
softmax
),
]
)
# Compile Keras model
optimizer
=
tf
.
keras
.
optimizers
.
RMSprop
(
lr
=
0.001
)
model
.
compile
(
loss
=
"categorical_crossentropy"
,
metrics
=
[
"accuracy"
],
optimizer
=
optimizer
)
return
model
# Create the model
model
=
create_model
(
num_features
=
dataset_train
.
_flat_shapes
[
0
]
.
dims
[
0
]
.
value
)
# Set up datasets
dataset_train
=
dataset_train
.
batch
(
args
.
batch_size
)
dataset_validation
=
dataset_validation
.
batch
(
args
.
batch_size
)
# Train the model
model
.
fit
(
dataset_train
,
epochs
=
args
.
epochs
,
validation_data
=
dataset_validation
)
tf
.
saved_model
.
save
(
model
,
os
.
getenv
(
"AIP_MODEL_DIR"
))
After you create the script, it appears in the root folder of your notebook:
Define arguments for your training script
You pass the following command-line arguments to your training script:
-
label_column
- This identifies the column in your data that contains what you want to predict. In this case, that column isspecies
. You defined this in a variable namedLABEL_COLUMN
when you processed your data. For more information, see Download, preprocess, and split the data . -
epochs
- This is the number of epochs used when you train your model. An epoch is an iteration over the data when training your model. This tutorial uses 20 epochs. -
batch_size
- This is the number of samples that are processed before your model updates. This tutorial uses a batch size of 10.
To define the arguments that are passed to your script, run the following code:
JOB_NAME = "custom_job_unique"
EPOCHS = 20
BATCH_SIZE = 10
CMDARGS = [
"--label_column=" + LABEL_COLUMN,
"--epochs=" + str(EPOCHS),
"--batch_size=" + str(BATCH_SIZE),
]