Google Cloud Ai Platform V1 Client - Class InputDataConfig (1.4.0)

Reference documentation and code samples for the Google Cloud Ai Platform V1 Client class InputDataConfig.

Specifies Vertex AI owned input data to be used for training, and possibly evaluating, the Model.

Generated from protobuf message google.cloud.aiplatform.v1.InputDataConfig

Namespace

Google \ Cloud \ AIPlatform \ V1

Methods

__construct

Constructor.

Parameters

Name

Description

data

array

Optional. Data for populating the Message object.

↳ fraction_split

 Google\Cloud\AIPlatform\V1\FractionSplit

Split based on fractions defining the size of each set.

↳ filter_split

 Google\Cloud\AIPlatform\V1\FilterSplit

Split based on the provided filters for each set.

↳ predefined_split

 Google\Cloud\AIPlatform\V1\PredefinedSplit

Supported only for tabular Datasets. Split based on a predefined key.

↳ timestamp_split

 Google\Cloud\AIPlatform\V1\TimestampSplit

Supported only for tabular Datasets. Split based on the timestamp of the input data pieces.

↳ stratified_split

 Google\Cloud\AIPlatform\V1\StratifiedSplit

Supported only for tabular Datasets. Split based on the distribution of the specified column.

↳ gcs_destination

 Google\Cloud\AIPlatform\V1\GcsDestination

The Cloud Storage location where the training data is to be written to. In the given directory a new directory is created with name: dataset-<dataset-id>-<annotation-type>-<timestamp-of-training-call> where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All training input data is written into that directory. The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training- .jsonl" * AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data * AIP_TRAINING_DATA_URI = "gcs_destination/dataset- .${AIP_DATA_FORMAT}" * AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-

↳ bigquery_destination

 Google\Cloud\AIPlatform\V1\BigQueryDestination

Only applicable to custom training with tabular Dataset with BigQuery source. The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call> where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, training , validation and test . * AIP_DATA_FORMAT = "bigquery". * AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_

↳ dataset_id

string

Required. The ID of the Dataset in the same Project and Location which data will be used to train the Model. The Dataset must use schema compatible with Model being trained, and what is compatible should be described in the used TrainingPipeline's [training_task_definition] [google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition]. For tabular Datasets, all their data is exported to training, to pick and choose from.

↳ annotations_filter

string

Applicable only to Datasets that have DataItems and Annotations. A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.

↳ annotation_schema_uri

string

Applicable only to custom training with Datasets that have DataItems and Annotations. Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object . The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id . Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter , the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri .

↳ saved_query_id

string

Only applicable to Datasets that have SavedQueries. The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training. Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter , the Annotations used for training are filtered by both saved_query_id and annotations_filter . Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type.

↳ persist_ml_use_assignment

bool

Whether to persist the ML use assignment to data item system labels.

getFractionSplit

Split based on fractions defining the size of each set.

Returns

Type

Description

 Google\Cloud\AIPlatform\V1\FractionSplit 
|null

hasFractionSplit

setFractionSplit

Split based on fractions defining the size of each set.

Parameter

Name

Description

var

 Google\Cloud\AIPlatform\V1\FractionSplit

Returns

Type

Description

$this

getFilterSplit

Split based on the provided filters for each set.

Returns

Type

Description

 Google\Cloud\AIPlatform\V1\FilterSplit 
|null

hasFilterSplit

setFilterSplit

Split based on the provided filters for each set.

Parameter

Name

Description

var

 Google\Cloud\AIPlatform\V1\FilterSplit

Returns

Type

Description

$this

getPredefinedSplit

Supported only for tabular Datasets.

Split based on a predefined key.

Returns

Type

Description

 Google\Cloud\AIPlatform\V1\PredefinedSplit 
|null

hasPredefinedSplit

setPredefinedSplit

Supported only for tabular Datasets.

Split based on a predefined key.

Parameter

Name

Description

var

 Google\Cloud\AIPlatform\V1\PredefinedSplit

Returns

Type

Description

$this

getTimestampSplit

Supported only for tabular Datasets.

Split based on the timestamp of the input data pieces.

Returns

Type

Description

 Google\Cloud\AIPlatform\V1\TimestampSplit 
|null

hasTimestampSplit

setTimestampSplit

Supported only for tabular Datasets.

Split based on the timestamp of the input data pieces.

Parameter

Name

Description

var

 Google\Cloud\AIPlatform\V1\TimestampSplit

Returns

Type

Description

$this

getStratifiedSplit

Supported only for tabular Datasets.

Split based on the distribution of the specified column.

Returns

Type

Description

 Google\Cloud\AIPlatform\V1\StratifiedSplit 
|null

hasStratifiedSplit

setStratifiedSplit

Supported only for tabular Datasets.

Split based on the distribution of the specified column.

Parameter

Name

Description

var

 Google\Cloud\AIPlatform\V1\StratifiedSplit

Returns

Type

Description

$this

getGcsDestination

All training input data is written into that directory. The Vertex AI environment variables representing Cloud Storage data URIs are represented in the Cloud Storage wildcard format to support sharded data. e.g.: "gs://.../training-*.jsonl"

AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
AIP_TRAINING_DATA_URI = "gcs_destination/dataset-
AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-
AIP_TEST_DATA_URI = "gcs_destination/dataset-

Returns

Type

Description

 Google\Cloud\AIPlatform\V1\GcsDestination 
|null

hasGcsDestination

setGcsDestination

AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
AIP_TRAINING_DATA_URI = "gcs_destination/dataset-
AIP_VALIDATION_DATA_URI = "gcs_destination/dataset-
AIP_TEST_DATA_URI = "gcs_destination/dataset-

Parameter

Name

Description

var

 Google\Cloud\AIPlatform\V1\GcsDestination

Returns

Type

Description

$this

getBigqueryDestination

Only applicable to custom training with tabular Dataset with BigQuery source.

The BigQuery project location where the training data is to be written to. In the given project a new dataset is created with name dataset_<dataset-id>_<annotation-type>_<timestamp-of-training-call> where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training input data is written into that dataset. In the dataset three tables are created, training , validation and test .

AIP_DATA_FORMAT = "bigquery".
AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_
AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_
AIP_TEST_DATA_URI = "bigquery_destination.dataset_

Returns

Type

Description

 Google\Cloud\AIPlatform\V1\BigQueryDestination 
|null

hasBigqueryDestination

setBigqueryDestination

Only applicable to custom training with tabular Dataset with BigQuery source.

AIP_DATA_FORMAT = "bigquery".
AIP_TRAINING_DATA_URI = "bigquery_destination.dataset_
AIP_VALIDATION_DATA_URI = "bigquery_destination.dataset_
AIP_TEST_DATA_URI = "bigquery_destination.dataset_

Parameter

Name

Description

var

 Google\Cloud\AIPlatform\V1\BigQueryDestination

Returns

Type

Description

$this

getDatasetId

For tabular Datasets, all their data is exported to training, to pick and choose from.

Returns

Type

Description

string

setDatasetId

For tabular Datasets, all their data is exported to training, to pick and choose from.

Parameter

Name

Description

var

string

Returns

Type

Description

$this

getAnnotationsFilter

Applicable only to Datasets that have DataItems and Annotations.

A filter on Annotations of the Dataset. Only Annotations that both match this filter and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on (for the auto-assigned that role is decided by Vertex AI). A filter with same syntax as the one used in ListAnnotations may be used, but note here it filters across all Annotations of the Dataset, and not just within a single DataItem.

Returns

Type

Description

string

setAnnotationsFilter

Applicable only to Datasets that have DataItems and Annotations.

Parameter

Name

Description

var

string

Returns

Type

Description

$this

getAnnotationSchemaUri

Applicable only to custom training with Datasets that have DataItems and Annotations.

Cloud Storage URI that points to a YAML file describing the annotation schema. The schema is defined as an OpenAPI 3.0.2 Schema Object . The schema files that can be used here are found in gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the chosen schema must be consistent with metadata of the Dataset specified by dataset_id . Only Annotations that both match this schema and belong to DataItems not ignored by the split method are used in respectively training, validation or test role, depending on the role of the DataItem they are on. When used in conjunction with annotations_filter , the Annotations used for training are filtered by both annotations_filter and annotation_schema_uri .

Returns

Type

Description

string

setAnnotationSchemaUri

Applicable only to custom training with Datasets that have DataItems and Annotations.

Parameter

Name

Description

var

string

Returns

Type

Description

$this

getSavedQueryId

Only applicable to Datasets that have SavedQueries.

The ID of a SavedQuery (annotation set) under the Dataset specified by dataset_id used for filtering Annotations for training. Only Annotations that are associated with this SavedQuery are used in respectively training. When used in conjunction with annotations_filter , the Annotations used for training are filtered by both saved_query_id and annotations_filter . Only one of saved_query_id and annotation_schema_uri should be specified as both of them represent the same thing: problem type.

Returns

Type

Description

string

setSavedQueryId

Only applicable to Datasets that have SavedQueries.

Parameter

Name

Description

var

string

Returns

Type

Description

$this

getPersistMlUseAssignment

Whether to persist the ML use assignment to data item system labels.

Returns

Type

Description

bool

setPersistMlUseAssignment

Whether to persist the ML use assignment to data item system labels.

Parameter

Name

Description

var

bool

Returns

Type

Description

$this

getSplit

Returns

Type

Description

string

getDestination

Returns

Type

Description

string

Google Cloud Ai Platform V1 Client - Class InputDataConfig (1.4.0) Stay organized with collections Save and categorize content based on your preferences.

Namespace

Methods

__construct

getFractionSplit

hasFractionSplit

setFractionSplit

getFilterSplit

hasFilterSplit

setFilterSplit

getPredefinedSplit

hasPredefinedSplit

setPredefinedSplit

getTimestampSplit

hasTimestampSplit

setTimestampSplit

getStratifiedSplit

hasStratifiedSplit

setStratifiedSplit

getGcsDestination

hasGcsDestination

setGcsDestination

getBigqueryDestination

hasBigqueryDestination

setBigqueryDestination

getDatasetId

setDatasetId

getAnnotationsFilter

setAnnotationsFilter

getAnnotationSchemaUri

setAnnotationSchemaUri

getSavedQueryId

setSavedQueryId

getPersistMlUseAssignment

setPersistMlUseAssignment

getSplit

getDestination

Google Cloud Ai Platform V1 Client - Class InputDataConfig (1.4.0)