- 1.111.0 (latest)
- 1.110.0
- 1.109.0
- 1.108.0
- 1.107.0
- 1.106.0
- 1.105.0
- 1.104.0
- 1.103.0
- 1.102.0
- 1.101.0
- 1.100.0
- 1.99.0
- 1.98.0
- 1.97.0
- 1.96.0
- 1.95.1
- 1.94.0
- 1.93.1
- 1.92.0
- 1.91.0
- 1.90.0
- 1.89.0
- 1.88.0
- 1.87.0
- 1.86.0
- 1.85.0
- 1.84.0
- 1.83.0
- 1.82.0
- 1.81.0
- 1.80.0
- 1.79.0
- 1.78.0
- 1.77.0
- 1.76.0
- 1.75.0
- 1.74.0
- 1.73.0
- 1.72.0
- 1.71.1
- 1.70.0
- 1.69.0
- 1.68.0
- 1.67.1
- 1.66.0
- 1.65.0
- 1.63.0
- 1.62.0
- 1.60.0
- 1.59.0
- 1.58.0
- 1.57.0
- 1.56.0
- 1.55.0
- 1.54.1
- 1.53.0
- 1.52.0
- 1.51.0
- 1.50.0
- 1.49.0
- 1.48.0
- 1.47.0
- 1.46.0
- 1.45.0
- 1.44.0
- 1.43.0
- 1.39.0
- 1.38.1
- 1.37.0
- 1.36.4
- 1.35.0
- 1.34.0
- 1.33.1
- 1.32.0
- 1.31.1
- 1.30.1
- 1.29.0
- 1.28.1
- 1.27.1
- 1.26.1
- 1.25.0
- 1.24.1
- 1.23.0
- 1.22.1
- 1.21.0
- 1.20.0
- 1.19.1
- 1.18.3
- 1.17.1
- 1.16.1
- 1.15.1
- 1.14.0
- 1.13.1
- 1.12.1
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.1
- 1.7.1
- 1.6.2
- 1.5.0
- 1.4.3
- 1.3.0
- 1.2.0
- 1.1.1
- 1.0.1
- 0.9.0
- 0.8.0
- 0.7.1
- 0.6.0
- 0.5.1
- 0.4.0
- 0.3.1
Endpoint
(
endpoint_name
:
str
,
project
:
Optional
[
str
]
=
None
,
location
:
Optional
[
str
]
=
None
,
credentials
:
Optional
[
google
.
auth
.
credentials
.
Credentials
]
=
None
,
)
Retrieves an endpoint resource.
Parameters
endpoint_name
str
Required. A fully-qualified endpoint resource name or endpoint ID. Example: "projects/123/locations/us-central1/endpoints/456" or "456" when project and location are initialized or passed.
project
str
Optional. Project to retrieve endpoint from. If not set, project set in aiplatform.init will be used.
location
str
Optional. Location to retrieve endpoint from. If not set, location set in aiplatform.init will be used.
credentials
auth_credentials.Credentials
Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
Inheritance
builtins.object > google.cloud.aiplatform.base.VertexAiResourceNoun > builtins.object > google.cloud.aiplatform.base.FutureManager > google.cloud.aiplatform.base.VertexAiResourceNounWithFutureManager > builtins.object > abc.ABC > google.cloud.aiplatform.base.PreviewMixin > EndpointProperties
create_time
Time this resource was created.
display_name
Display name of this resource.
encryption_spec
Customer-managed encryption key options for this Vertex AI resource.
If this is set, then all resources created by this Vertex AI resource will be encrypted with the provided encryption key.
gca_resource
The underlying resource proto representation.
labels
User-defined labels containing metadata about this resource.
Read more about labels at https://goo.gl/xmQnxf
name
Name of this resource.
network
The full name of the Google Compute Engine network to which this Endpoint should be peered.
Takes the format projects/{project}/global/networks/{network}
. Where
{project} is a project number, as in 12345
, and {network} is a network name.
Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network.
preview
Return an Endpoint instance with preview features enabled.
resource_name
Full qualified resource name.
traffic_split
A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel.
If a DeployedModel's ID is not listed in this map, then it receives no traffic.
The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.
update_time
Time this resource was last updated.
Methods
create
create
(
display_name
:
Optional
[
str
]
=
None
,
description
:
Optional
[
str
]
=
None
,
labels
:
Optional
[
Dict
[
str
,
str
]]
=
None
,
metadata
:
Optional
[
Sequence
[
Tuple
[
str
,
str
]]]
=
(),
project
:
Optional
[
str
]
=
None
,
location
:
Optional
[
str
]
=
None
,
credentials
:
Optional
[
google
.
auth
.
credentials
.
Credentials
]
=
None
,
encryption_spec_key_name
:
Optional
[
str
]
=
None
,
sync
=
True
,
create_request_timeout
:
Optional
[
float
]
=
None
,
endpoint_id
:
Optional
[
str
]
=
None
,
enable_request_response_logging
=
False
,
request_response_logging_sampling_rate
:
Optional
[
float
]
=
None
,
request_response_logging_bq_destination_table
:
Optional
[
str
]
=
None
,
)
Creates a new endpoint.
display_name
str
Optional. The user-defined name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
description
str
Optional. The description of the Endpoint.
labels
Dict[str, str]
Optional. The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
metadata
Sequence[Tuple[str, str]]
Optional. Strings which should be sent along with the request as metadata.
project
str
Required. Project to retrieve endpoint from. If not set, project set in aiplatform.init will be used.
location
str
Required. Location to retrieve endpoint from. If not set, location set in aiplatform.init will be used.
credentials
auth_credentials.Credentials
Optional. Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.
encryption_spec_key_name
str
Optional. The Cloud KMS resource identifier of the customer managed encryption key used to protect the model. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key
. The key needs to be in the same region as where the compute resource is created. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key. Overrides encryption_spec_key_name set in aiplatform.init.
create_request_timeout
float
Optional. The timeout for the create request in seconds.
endpoint_id
str
Optional. The ID to use for endpoint, which will become the final component of the endpoint resource name. If not provided, Vertex AI will generate a value for this ID. This value should be 1-10 characters, and valid characters are /[0-9]/. When using HTTP/JSON, this field is populated based on a query string argument, such as ?endpoint_id=12345
. This is the fallback for fields that are not included in either the URI or the body.
request_response_logging_sampling_rate
float
Optional. The request response logging sampling rate. If not set, default is 0.0.
request_response_logging_bq_destination_table
str
Optional. The request response logging bigquery destination. If not set, will create a table with name: bq://{project_id}.logging_{endpoint_display_name}_{endpoint_id}.request_response_logging
.
sync
bool
Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
enable_request_response_logging
bool
Optional. Whether to enable request & response logging for this endpoint.
endpoint (aiplatform.Endpoint)
delete
delete
(
force
:
bool
=
False
,
sync
:
bool
=
True
)
Deletes this Vertex AI Endpoint resource. If force is set to True, all models on this Endpoint will be undeployed prior to deletion.
force
bool
Required. If force is set to True, all deployed models on this Endpoint will be undeployed first. Default is False.
sync
bool
Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
FailedPrecondition
deploy
deploy
(
model
:
google
.
cloud
.
aiplatform
.
models
.
Model
,
deployed_model_display_name
:
Optional
[
str
]
=
None
,
traffic_percentage
:
int
=
0
,
traffic_split
:
Optional
[
Dict
[
str
,
int
]]
=
None
,
machine_type
:
Optional
[
str
]
=
None
,
min_replica_count
:
int
=
1
,
max_replica_count
:
int
=
1
,
accelerator_type
:
Optional
[
str
]
=
None
,
accelerator_count
:
Optional
[
int
]
=
None
,
service_account
:
Optional
[
str
]
=
None
,
explanation_metadata
:
Optional
[
google
.
cloud
.
aiplatform_v1
.
types
.
explanation_metadata
.
ExplanationMetadata
]
=
None
,
explanation_parameters
:
Optional
[
google
.
cloud
.
aiplatform_v1
.
types
.
explanation
.
ExplanationParameters
]
=
None
,
metadata
:
Optional
[
Sequence
[
Tuple
[
str
,
str
]]]
=
(),
sync
=
True
,
deploy_request_timeout
:
Optional
[
float
]
=
None
,
autoscaling_target_cpu_utilization
:
Optional
[
int
]
=
None
,
autoscaling_target_accelerator_duty_cycle
:
Optional
[
int
]
=
None
,
enable_access_logging
=
False
,
)
Deploys a Model to the Endpoint.
deployed_model_display_name
str
Optional. The display name of the DeployedModel. If not provided upon creation, the Model's display_name is used.
traffic_percentage
int
Optional. Desired traffic to newly deployed model. Defaults to 0 if there are pre-existing deployed models. Defaults to 100 if there are no pre-existing deployed models. Negative values should not be provided. Traffic of previously deployed models at the endpoint will be scaled down to accommodate new deployed model's traffic. Should not be provided if traffic_split is provided.
traffic_split
Dict[str, int]
Optional. A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at the moment. Key for model being deployed is "0". Should not be provided if traffic_percentage is provided.
machine_type
str
Optional. The type of machine. Not specifying machine type will result in model to be deployed with automatic resources.
min_replica_count
int
Optional. The minimum number of machine replicas this deployed model will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.
max_replica_count
int
Optional. The maximum number of replicas this deployed model may be deployed on when the traffic against it increases. If requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale the model to that many replicas is guaranteed (barring service outages). If traffic against the deployed model increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, the larger value of min_replica_count or 1 will be used. If value provided is smaller than min_replica_count, it will automatically be increased to be min_replica_count.
accelerator_type
str
Optional. Hardware accelerator type. Must also set accelerator_count if used. One of ACCELERATOR_TYPE_UNSPECIFIED, NVIDIA_TESLA_K80, NVIDIA_TESLA_P100, NVIDIA_TESLA_V100, NVIDIA_TESLA_P4, NVIDIA_TESLA_T4
accelerator_count
int
Optional. The number of accelerators to attach to a worker replica.
service_account
str
The service account that the DeployedModel's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project. Users deploying the Model must have the iam.serviceAccounts.actAs
permission on this service account.
explanation_metadata
aiplatform.explain.ExplanationMetadata
Optional. Metadata describing the Model's input and output for explanation. explanation_metadata
is optional while explanation_parameters
must be specified when used. For more details, see Ref docs http://tinyurl.com/1igh60kt
explanation_parameters
aiplatform.explain.ExplanationParameters
Optional. Parameters to configure explaining for Model's predictions. For more details, see Ref docs http://tinyurl.com/1an4zake
metadata
Sequence[Tuple[str, str]]
Optional. Strings which should be sent along with the request as metadata.
deploy_request_timeout
float
Optional. The timeout for the deploy request in seconds.
autoscaling_target_cpu_utilization
int
Target CPU Utilization to use for Autoscaling Replicas. A default value of 60 will be used if not specified.
autoscaling_target_accelerator_duty_cycle
int
Target Accelerator Duty Cycle. Must also set accelerator_type and accelerator_count if specified. A default value of 60 will be used if not specified.
model
aiplatform.Model
Required. Model to be deployed.
sync
bool
Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
enable_access_logging
bool
Whether to enable endpoint access logging. Defaults to False.
explain
explain
(
instances
:
List
[
Dict
],
parameters
:
Optional
[
Dict
]
=
None
,
deployed_model_id
:
Optional
[
str
]
=
None
,
timeout
:
Optional
[
float
]
=
None
,
)
Make a prediction with explanations against this Endpoint.
Example usage: response = my_endpoint.explain(instances=[...]) my_explanations = response.explanations
instances
List
Required. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] instance_schema_uri
.
parameters
Dict
The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] parameters_schema_uri
.
deployed_model_id
str
Optional. If specified, this ExplainRequest will be served by the chosen DeployedModel, overriding this Endpoint's traffic split.
timeout
float
Optional. The timeout for this request in seconds.
prediction (aiplatform.Prediction)
list
list
(
filter
:
Optional
[
str
]
=
None
,
order_by
:
Optional
[
str
]
=
None
,
project
:
Optional
[
str
]
=
None
,
location
:
Optional
[
str
]
=
None
,
credentials
:
Optional
[
google
.
auth
.
credentials
.
Credentials
]
=
None
,
)
List all Endpoint resource instances.
Example Usage: aiplatform.Endpoint.list( filter='labels.my_label="my_label_value" OR display_name=!"old_endpoint"', )
filter
str
Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.
order_by
str
Optional. A comma-separated list of fields to order by, sorted in ascending order. Use "desc" after a field name for descending. Supported fields: display_name
, create_time
, update_time
project
str
Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used.
location
str
Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used.
credentials
auth_credentials.Credentials
Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init.
List[models.Endpoint]
list_models
list_models
()
Returns a list of the models deployed to this Endpoint.
deployed_models (List[aiplatform.gapic.DeployedModel])
predict
predict
(
instances
:
List
,
parameters
:
Optional
[
Dict
]
=
None
,
timeout
:
Optional
[
float
]
=
None
,
use_raw_predict
:
Optional
[
bool
]
=
False
,
)
Make a prediction against this Endpoint.
instances
List
Required. The instances that are the input to the prediction call. A DeployedModel may have an upper limit on the number of instances it supports per request, and when it is exceeded the prediction call errors in case of AutoML Models, or, in case of customer created Models, the behaviour is as documented by that Model. The schema of any single instance may be specified via Endpoint's DeployedModels' [Model's][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] instance_schema_uri
.
parameters
Dict
The parameters that govern the prediction. The schema of the parameters may be specified via Endpoint's DeployedModels' [Model's ][google.cloud.aiplatform.v1beta1.DeployedModel.model] [PredictSchemata's][google.cloud.aiplatform.v1beta1.Model.predict_schemata] parameters_schema_uri
.
timeout
float
Optional. The timeout for this request in seconds.
use_raw_predict
bool
Optional. Default value is False. If set to True, the underlying prediction call will be made against Endpoint.raw_predict().
prediction (aiplatform.Prediction)
raw_predict
raw_predict
(
body
:
bytes
,
headers
:
Dict
[
str
,
str
])
Makes a prediction request using arbitrary headers.
Example usage: my_endpoint = aiplatform.Endpoint(ENDPOINT_ID) response = my_endpoint.raw_predict( body = b'{"instances":[{"feat_1":val_1, "feat_2":val_2}]}' headers = {'Content-Type':'application/json'} ) status_code = response.status_code results = json.dumps(response.text)
body
bytes
The body of the prediction request in bytes. This must not exceed 1.5 mb per request.
headers
Dict[str, str]
The header of the request as a dictionary. There are no restrictions on the header.
to_dict
to_dict
()
Returns the resource proto as a dictionary.
undeploy
undeploy
(
deployed_model_id
:
str
,
traffic_split
:
Optional
[
Dict
[
str
,
int
]]
=
None
,
metadata
:
Optional
[
Sequence
[
Tuple
[
str
,
str
]]]
=
(),
sync
=
True
,
)
Undeploys a deployed model.
The model to be undeployed should have no traffic or user must provide
a new traffic_split with the remaining deployed models. Refer
to Endpoint.traffic_split
for the current traffic split mapping.
deployed_model_id
str
Required. The ID of the DeployedModel to be undeployed from the Endpoint.
traffic_split
Dict[str, int]
Optional. A map of DeployedModel IDs to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. Required if undeploying a model with non-zero traffic from an Endpoint with multiple deployed models. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at the moment. If a DeployedModel's ID is not listed in this map, then it receives no traffic.
metadata
Sequence[Tuple[str, str]]
Optional. Strings which should be sent along with the request as metadata.
undeploy_all
undeploy_all
(
sync
:
bool
=
True
)
Undeploys every model deployed to this Endpoint.
sync
bool
Whether to execute this method synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.
update
update
(
display_name
:
Optional
[
str
]
=
None
,
description
:
Optional
[
str
]
=
None
,
labels
:
Optional
[
Dict
[
str
,
str
]]
=
None
,
traffic_split
:
Optional
[
Dict
[
str
,
int
]]
=
None
,
request_metadata
:
Optional
[
Sequence
[
Tuple
[
str
,
str
]]]
=
(),
update_request_timeout
:
Optional
[
float
]
=
None
,
)
Updates an endpoint.
Example usage: my_endpoint = my_endpoint.update( display_name='my-updated-endpoint', description='my updated description', labels={'key': 'value'}, traffic_split={ '123456': 20, '234567': 80, }, )
display_name
str
Optional. The display name of the Endpoint. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
description
str
Optional. The description of the Endpoint.
labels
Dict[str, str]
Optional. The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
traffic_split
Dict[str, int]
Optional. A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.
request_metadata
Sequence[Tuple[str, str]]
Optional. Strings which should be sent along with the request as metadata.
update_request_timeout
float
Optional. The timeout for the update request in seconds.
ValueError
labels
is not the correct format.Endpoint (aiplatform.Prediction)
wait
wait
()
Helper method that blocks until all futures are complete.