- 1.73.0 (latest)
- 1.72.0
- 1.71.1
- 1.70.0
- 1.69.0
- 1.68.0
- 1.67.1
- 1.66.0
- 1.65.0
- 1.63.0
- 1.62.0
- 1.60.0
- 1.59.0
- 1.58.0
- 1.57.0
- 1.56.0
- 1.55.0
- 1.54.1
- 1.53.0
- 1.52.0
- 1.51.0
- 1.50.0
- 1.49.0
- 1.48.0
- 1.47.0
- 1.46.0
- 1.45.0
- 1.44.0
- 1.43.0
- 1.39.0
- 1.38.1
- 1.37.0
- 1.36.4
- 1.35.0
- 1.34.0
- 1.33.1
- 1.32.0
- 1.31.1
- 1.30.1
- 1.29.0
- 1.28.1
- 1.27.1
- 1.26.1
- 1.25.0
- 1.24.1
- 1.23.0
- 1.22.1
- 1.21.0
- 1.20.0
- 1.19.1
- 1.18.3
- 1.17.1
- 1.16.1
- 1.15.1
- 1.14.0
- 1.13.1
- 1.12.1
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.1
- 1.7.1
- 1.6.2
- 1.5.0
- 1.4.3
- 1.3.0
- 1.2.0
- 1.1.1
- 1.0.1
- 0.9.0
- 0.8.0
- 0.7.1
- 0.6.0
- 0.5.1
- 0.4.0
- 0.3.1
InputDataConfig
(
mapping
=
None
,
*
,
ignore_unknown_fields
=
False
,
**
kwargs
)
Specifies Vertex AI owned input data to be used for training, and possibly evaluating, the Model.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
Attributes
fraction_split
google.cloud.aiplatform_v1.types.FractionSplit
Split based on fractions defining the size of each set. This field is a member of
oneof
_ split
.filter_split
google.cloud.aiplatform_v1.types.FilterSplit
Split based on the provided filters for each set. This field is a member of
oneof
_ split
.predefined_split
google.cloud.aiplatform_v1.types.PredefinedSplit
Supported only for tabular Datasets. Split based on a predefined key. This field is a member of
oneof
_ split
.timestamp_split
google.cloud.aiplatform_v1.types.TimestampSplit
Supported only for tabular Datasets. Split based on the timestamp of the input data pieces. This field is a member of
oneof
_ split
.stratified_split
google.cloud.aiplatform_v1.types.StratifiedSplit
Supported only for tabular Datasets. Split based on the distribution of the specified column. This field is a member of
oneof
_ split
.gcs_destination
google.cloud.aiplatform_v1.types.GcsDestination
The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name:
dataset-
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601
format. All training input data is written into that
directory.
The Vertex AI environment variables representing Cloud
Storage data URIs are represented in the Cloud Storage
wildcard format to support sharded data. e.g.:
"gs://.../training-*.jsonl"
- AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for
tabular data
- AIP_TRAINING_DATA_URI =
"gcs_destination/dataset---/training-*.${AIP_DATA_FORMAT}"
- AIP_VALIDATION_DATA_URI =
"gcs_destination/dataset---/validation-*.${AIP_DATA_FORMAT}"
- AIP_TEST_DATA_URI =
"gcs_destination/dataset---/test-*.${AIP_DATA_FORMAT}".
This field is a member of oneof
_ destination
.bigquery_destination
google.cloud.aiplatform_v1.types.BigQueryDestination
Only applicable to custom training with tabular Dataset with BigQuery source. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name
dataset_
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All
training input data is written into that dataset. In the
dataset three tables are created, training
, validation
and test
.
- AIP_DATA_FORMAT = "bigquery".
- AIP_TRAINING_DATA_URI =
"bigquery_destination.dataset\_\ **\ .training"
- AIP_VALIDATION_DATA_URI =
"bigquery_destination.dataset\_\ **\ .validation"
- AIP_TEST_DATA_URI =
"bigquery_destination.dataset\_\ **\ .test".
This field is a member of oneof
_ destination
.dataset_id
str
Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.
annotations_filter
str
Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.
annotation_schema_uri
str
Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 `Schema Object
saved_query_id
str
Only applicable to Datasets that have SavedQueries. The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training. Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter , the Annotations used for training are filtered by both saved_query_id and annotations_filter . Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type.
persist_ml_use_assignment
bool
Whether to persist the ML use assignment to data item system labels.