Get TabNet online inferences

This page shows you how to get online (real-time) inferences and explanations from your tabular classification or regression models using the Google Cloud console or the Vertex AI API.

An online inference is a synchronous request as opposed to a batch inference , which is an asynchronous request. Use online inferences when you are making requests in response to application input or in other situations where you require timely inference.

You must deploy a model to an endpoint before that model can be used to serve online inferences. Deploying a model associates physical resources with the model so it can serve online inferences with low latency.

The topics covered are:

  1. Deploy a model to an endpoint
  2. Get an online inference using your deployed model

Before you begin

Before you can get online inferences, you must first train a model.

Deploy a model to an endpoint

You can deploy more than one model to an endpoint, and you can deploy a model to more than one endpoint. For more information about options and use cases for deploying models, see About deploying models .

Use one of the following methods to deploy a model:

Google Cloud console

  1. In the Google Cloud console, in the Vertex AI section, go to the Modelspage.

    Go to the Models page

  2. Click the name of the model you want to deploy to open its details page.

  3. Select the Deploy & Testtab.

    If your model is already deployed to any endpoints, they are listed in the Deploy your modelsection.

  4. Click Deploy to endpoint.

  5. In the Define your endpointpage, configure as follows:

    1. You can choose to deploy your model to a new endpoint or an existing endpoint.

      • To deploy your model to a new endpoint, select Create new endpoint and provide a name for the new endpoint.
      • To deploy your model to an existing endpoint, select Add to existing endpoint and select the endpoint from the drop-down list.
      • You can add more than one model to an endpoint, and you can add a model to more than one endpoint. Learn more .
    2. Click Continue.

  6. In the Model settingspage, configure as follows:

    1. If you're deploying your model to a new endpoint, accept 100 for the Traffic split . If you're deploying your model to an existing endpoint that has one or more models deployed to it, you must update the Traffic split percentage for the model you are deploying and the already deployed models so that all of the percentages add up to 100%.

    2. Enter the Minimum number of compute nodesyou want to provide for your model.

      This is the number of nodes available to this model at all times. You are charged for the nodes used, whether to handle inference load or for standby (minimum) nodes, even without inference traffic. See the pricing page .

    3. Select your Machine type.

      Larger machine resources will increase your inference performance and increase costs.

    4. Learn how to change the default settings for inference logging .

    5. Click Continue

  7. In the Model monitoringpage, click Continue.

  8. In the Monitoring objectivespage, configure as follows:

    1. Enter the location of your training data.
    2. Enter the name of the target column.
  9. Click Deploy to deploy your model to the endpoint.

API

When you deploy a model using the Vertex AI API, you complete the following steps:

  1. Create an endpoint if needed.
  2. Get the endpoint ID.
  3. Deploy the model to the endpoint.

Create an endpoint

If you are deploying a model to an existing endpoint, you can skip this step.

gcloud

The following example uses the gcloud ai endpoints create command :

   
gcloud  
ai  
endpoints  
create  
 \ 
  
--region = 
 LOCATION 
  
 \ 
  
--display-name = 
 ENDPOINT_NAME 
 

Replace the following:

  • LOCATION_ID : The region where you are using Vertex AI.
  • ENDPOINT_NAME : The display name for the endpoint.

    The Google Cloud CLI tool might take a few seconds to create the endpoint.

REST

Before using any of the request data, make the following replacements:

  • LOCATION_ID : Your region.
  • PROJECT_ID : Your project ID .
  • ENDPOINT_NAME : The display name for the endpoint.

HTTP method and URL:

POST https:// LOCATION_ID 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION_ID 
/endpoints

Request JSON body:

{
  "display_name": " ENDPOINT_NAME 
"
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/ PROJECT_NUMBER 
/locations/ LOCATION_ID 
/endpoints/ ENDPOINT_ID 
/operations/ OPERATION_ID 
",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}
You can poll for the status of the operation until the response includes "done": true .

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Java API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  import 
  
 com.google.api.gax.longrunning. OperationFuture 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. CreateEndpointOperationMetadata 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. Endpoint 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. EndpointServiceClient 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. EndpointServiceSettings 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. LocationName 
 
 ; 
 import 
  
 java.io.IOException 
 ; 
 import 
  
 java.util.concurrent.ExecutionException 
 ; 
 import 
  
 java.util.concurrent.TimeUnit 
 ; 
 import 
  
 java.util.concurrent.TimeoutException 
 ; 
 public 
  
 class 
 CreateEndpointSample 
  
 { 
  
 public 
  
 static 
  
 void 
  
 main 
 ( 
 String 
 [] 
  
 args 
 ) 
  
 throws 
  
 IOException 
 , 
  
 InterruptedException 
 , 
  
 ExecutionException 
 , 
  
 TimeoutException 
  
 { 
  
 // TODO(developer): Replace these variables before running the sample. 
  
 String 
  
 project 
  
 = 
  
 "YOUR_PROJECT_ID" 
 ; 
  
 String 
  
 endpointDisplayName 
  
 = 
  
 "YOUR_ENDPOINT_DISPLAY_NAME" 
 ; 
  
 createEndpointSample 
 ( 
 project 
 , 
  
 endpointDisplayName 
 ); 
  
 } 
  
 static 
  
 void 
  
 createEndpointSample 
 ( 
 String 
  
 project 
 , 
  
 String 
  
 endpointDisplayName 
 ) 
  
 throws 
  
 IOException 
 , 
  
 InterruptedException 
 , 
  
 ExecutionException 
 , 
  
 TimeoutException 
  
 { 
  
  EndpointServiceSettings 
 
  
 endpointServiceSettings 
  
 = 
  
  EndpointServiceSettings 
 
 . 
 newBuilder 
 () 
  
 . 
 setEndpoint 
 ( 
 "us-central1-aiplatform.googleapis.com:443" 
 ) 
  
 . 
 build 
 (); 
  
 // Initialize client that will be used to send requests. This client only needs to be created 
  
 // once, and can be reused for multiple requests. After completing all of your requests, call 
  
 // the "close" method on the client to safely clean up any remaining background resources. 
  
 try 
  
 ( 
  EndpointServiceClient 
 
  
 endpointServiceClient 
  
 = 
  
  EndpointServiceClient 
 
 . 
 create 
 ( 
 endpointServiceSettings 
 )) 
  
 { 
  
 String 
  
 location 
  
 = 
  
 "us-central1" 
 ; 
  
  LocationName 
 
  
 locationName 
  
 = 
  
  LocationName 
 
 . 
 of 
 ( 
 project 
 , 
  
 location 
 ); 
  
  Endpoint 
 
  
 endpoint 
  
 = 
  
  Endpoint 
 
 . 
 newBuilder 
 (). 
 setDisplayName 
 ( 
 endpointDisplayName 
 ). 
 build 
 (); 
  
 OperationFuture<Endpoint 
 , 
  
 CreateEndpointOperationMetadata 
>  
 endpointFuture 
  
 = 
  
 endpointServiceClient 
 . 
  createEndpointAsync 
 
 ( 
 locationName 
 , 
  
 endpoint 
 ); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "Operation name: %s\n" 
 , 
  
 endpointFuture 
 . 
 getInitialFuture 
 (). 
  get 
 
 (). 
 getName 
 ()); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Waiting for operation to finish..." 
 ); 
  
  Endpoint 
 
  
 endpointResponse 
  
 = 
  
 endpointFuture 
 . 
  get 
 
 ( 
 300 
 , 
  
 TimeUnit 
 . 
 SECONDS 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Create Endpoint Response" 
 ); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "Name: %s\n" 
 , 
  
 endpointResponse 
 . 
  getName 
 
 ()); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "Display Name: %s\n" 
 , 
  
 endpointResponse 
 . 
  getDisplayName 
 
 ()); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "Description: %s\n" 
 , 
  
 endpointResponse 
 . 
  getDescription 
 
 ()); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "Labels: %s\n" 
 , 
  
 endpointResponse 
 . 
  getLabelsMap 
 
 ()); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "Create Time: %s\n" 
 , 
  
 endpointResponse 
 . 
  getCreateTime 
 
 ()); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "Update Time: %s\n" 
 , 
  
 endpointResponse 
 . 
  getUpdateTime 
 
 ()); 
  
 } 
  
 } 
 } 
 

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Node.js API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  /** 
 * TODO(developer): Uncomment these variables before running the sample.\ 
 * (Not necessary if passing values as arguments) 
 */ 
 // const endpointDisplayName = 'YOUR_ENDPOINT_DISPLAY_NAME'; 
 // const project = 'YOUR_PROJECT_ID'; 
 // const location = 'YOUR_PROJECT_LOCATION'; 
 // Imports the Google Cloud Endpoint Service Client library 
 const 
  
 { 
 EndpointServiceClient 
 } 
  
 = 
  
 require 
 ( 
 ' @google-cloud/aiplatform 
' 
 ); 
 // Specifies the location of the api endpoint 
 const 
  
 clientOptions 
  
 = 
  
 { 
  
 apiEndpoint 
 : 
  
 'us-central1-aiplatform.googleapis.com' 
 , 
 }; 
 // Instantiates a client 
 const 
  
 endpointServiceClient 
  
 = 
  
 new 
  
  EndpointServiceClient 
 
 ( 
 clientOptions 
 ); 
 async 
  
 function 
  
 createEndpoint 
 () 
  
 { 
  
 // Configure the parent resource 
  
 const 
  
 parent 
  
 = 
  
 `projects/ 
 ${ 
 project 
 } 
 /locations/ 
 ${ 
 location 
 } 
 ` 
 ; 
  
 const 
  
 endpoint 
  
 = 
  
 { 
  
 displayName 
 : 
  
 endpointDisplayName 
 , 
  
 }; 
  
 const 
  
 request 
  
 = 
  
 { 
  
 parent 
 , 
  
 endpoint 
 , 
  
 }; 
  
 // Get and print out a list of all the endpoints for this resource 
  
 const 
  
 [ 
 response 
 ] 
  
 = 
  
 await 
  
 endpointServiceClient 
 . 
 createEndpoint 
 ( 
 request 
 ); 
  
 console 
 . 
 log 
 ( 
 `Long running operation : 
 ${ 
 response 
 . 
 name 
 } 
 ` 
 ); 
  
 // Wait for operation to complete 
  
 await 
  
 response 
 . 
 promise 
 (); 
  
 const 
  
 result 
  
 = 
  
 response 
 . 
 result 
 ; 
  
 console 
 . 
 log 
 ( 
 'Create endpoint response' 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tName : 
 ${ 
 result 
 . 
 name 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tDisplay name : 
 ${ 
 result 
 . 
 displayName 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tDescription : 
 ${ 
 result 
 . 
 description 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tLabels : 
 ${ 
 JSON 
 . 
 stringify 
 ( 
 result 
 . 
 labels 
 ) 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tCreate time : 
 ${ 
 JSON 
 . 
 stringify 
 ( 
 result 
 . 
 createTime 
 ) 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tUpdate time : 
 ${ 
 JSON 
 . 
 stringify 
 ( 
 result 
 . 
 updateTime 
 ) 
 } 
 ` 
 ); 
 } 
 createEndpoint 
 (); 
 

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python . For more information, see the Python API reference documentation .

 def create_endpoint_sample(
    project: str,
    display_name: str,
    location: str,
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint.create(
        display_name=display_name,
        project=project,
        location=location,
    )

    print(endpoint.display_name)
    print(endpoint.resource_name)
    return endpoint 

Retrieve the endpoint ID

You need the endpoint ID to deploy the model.

gcloud

The following example uses the gcloud ai endpoints list command :

   
gcloud  
ai  
endpoints  
list  
 \ 
  
--region = 
 LOCATION 
  
 \ 
  
--filter = 
 display_name 
 = 
 ENDPOINT_NAME 
 

Replace the following:

  • LOCATION_ID : The region where you are using Vertex AI.
  • ENDPOINT_NAME : The display name for the endpoint.

    Note the number that appears in the ENDPOINT_ID column. Use this ID in the following step.

REST

Before using any of the request data, make the following replacements:

  • LOCATION_ID : The region where you are using Vertex AI.
  • PROJECT_ID : .
  • ENDPOINT_NAME : The display name for the endpoint.

HTTP method and URL:

GET https:// LOCATION_ID 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION_ID 
/endpoints?filter=display_name= ENDPOINT_NAME 

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "endpoints": [
    {
      "name": "projects/ PROJECT_NUMBER 
/locations/ LOCATION_ID 
/endpoints/ ENDPOINT_ID 
",
      "displayName": " ENDPOINT_NAME 
",
      "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",
      "createTime": "2020-04-17T18:31:11.585169Z",
      "updateTime": "2020-04-17T18:35:08.568959Z"
    }
  ]
}
Note the ENDPOINT_ID .

Deploy the model

Select the tab below for your language or environment:

gcloud

The following examples use the gcloud ai endpoints deploy-model command .

The following example deploys a Model to an Endpoint without using GPUs to accelerate prediction serving and without splitting traffic between multiple DeployedModel resources:

Before using any of the command data below, make the following replacements:

  • ENDPOINT_ID : The ID for the endpoint.
  • LOCATION_ID : The region where you are using Vertex AI.
  • MODEL_ID : The ID for the model to be deployed.
  • DEPLOYED_MODEL_NAME : A name for the DeployedModel . You can use the display name of the Model for the DeployedModel as well.
  • MACHINE_TYPE : Optional. The machine resources used for each node of this deployment. Its default setting is n1-standard-2 . Learn more about machine types.
  • MIN_REPLICA_COUNT : The minimum number of nodes for this deployment. The node count can be increased or decreased as required by the inference load, up to the maximum number of nodes and never fewer than this number of nodes. This value must be greater than or equal to 1. If the --min-replica-count flag is omitted, the value defaults to 1.
  • MAX_REPLICA_COUNT : The maximum number of nodes for this deployment. The node count can be increased or decreased as required by the inference load, up to this number of nodes and never fewer than the minimum number of nodes. If you omit the --max-replica-count flag, then maximum number of nodes is set to the value of --min-replica-count .

Execute the gcloud ai endpoints deploy-model command:

Linux, macOS, or Cloud Shell

gcloud  
ai  
endpoints  
deploy-model  
 ENDPOINT_ID 
 \ 
  
--region = 
 LOCATION_ID 
  
 \ 
  
--model = 
 MODEL_ID 
  
 \ 
  
--display-name = 
 DEPLOYED_MODEL_NAME 
  
 \ 
  
--machine-type = 
 MACHINE_TYPE 
  
 \ 
  
--min-replica-count = 
 MIN_REPLICA_COUNT 
  
 \ 
  
--max-replica-count = 
 MAX_REPLICA_COUNT 
  
 \ 
  
--traffic-split = 
 0 
 = 
 100 

Windows (PowerShell)

gcloud  
ai  
endpoints  
deploy-model  
 ENDPOINT_ID 
 ` 
  
--region = 
 LOCATION_ID 
  
 ` 
  
--model = 
 MODEL_ID 
  
 ` 
  
--display-name = 
 DEPLOYED_MODEL_NAME 
  
 ` 
  
--machine-type = 
 MACHINE_TYPE 
  
 ` 
  
--min-replica-count = 
 MIN_REPLICA_COUNT 
  
 ` 
  
--max-replica-count = 
 MAX_REPLICA_COUNT 
  
 ` 
  
--traffic-split = 
 0 
 = 
 100 

Windows (cmd.exe)

gcloud  
ai  
endpoints  
deploy-model  
 ENDPOINT_ID 
^  
--region = 
 LOCATION_ID 
  
^  
--model = 
 MODEL_ID 
  
^  
--display-name = 
 DEPLOYED_MODEL_NAME 
  
^  
--machine-type = 
 MACHINE_TYPE 
  
^  
--min-replica-count = 
 MIN_REPLICA_COUNT 
  
^  
--max-replica-count = 
 MAX_REPLICA_COUNT 
  
^  
--traffic-split = 
 0 
 = 
 100 

Splitting traffic

The --traffic-split=0=100 flag in the preceding examples sends 100% of prediction traffic that the Endpoint receives to the new DeployedModel , which is represented by the temporary ID 0 . If your Endpoint already has other DeployedModel resources, then you can split traffic between the new DeployedModel and the old ones. For example, to send 20% of traffic to the new DeployedModel and 80% to an older one, run the following command.

Before using any of the command data below, make the following replacements:

  • OLD_DEPLOYED_MODEL_ID : the ID of the existing DeployedModel .

Execute the gcloud ai endpoints deploy-model command:

Linux, macOS, or Cloud Shell

gcloud  
ai  
endpoints  
deploy-model  
 ENDPOINT_ID 
 \ 
  
--region = 
 LOCATION_ID 
  
 \ 
  
--model = 
 MODEL_ID 
  
 \ 
  
--display-name = 
 DEPLOYED_MODEL_NAME 
  
 \ 
  
--machine-type = 
 MACHINE_TYPE 
  
 \ 
  
--min-replica-count = 
 MIN_REPLICA_COUNT 
  
 \ 
  
--max-replica-count = 
 MAX_REPLICA_COUNT 
  
 \ 
  
 --traffic-split = 
 0 
 = 
 20 
, OLD_DEPLOYED_MODEL_ID 
 = 
 80 

Windows (PowerShell)

gcloud  
ai  
endpoints  
deploy-model  
 ENDPOINT_ID 
 ` 
  
--region = 
 LOCATION_ID 
  
 ` 
  
--model = 
 MODEL_ID 
  
 ` 
  
--display-name = 
 DEPLOYED_MODEL_NAME 
  
 \ 
  
--machine-type = 
 MACHINE_TYPE 
  
 ` 
  
--min-replica-count = 
 MIN_REPLICA_COUNT 
  
 ` 
  
--max-replica-count = 
 MAX_REPLICA_COUNT 
  
 ` 
  
 --traffic-split = 
 0 
 = 
 20 
, OLD_DEPLOYED_MODEL_ID 
 = 
 80 

Windows (cmd.exe)

gcloud  
ai  
endpoints  
deploy-model  
 ENDPOINT_ID 
^  
--region = 
 LOCATION_ID 
  
^  
--model = 
 MODEL_ID 
  
^  
--display-name = 
 DEPLOYED_MODEL_NAME 
  
 \ 
  
--machine-type = 
 MACHINE_TYPE 
  
^  
--min-replica-count = 
 MIN_REPLICA_COUNT 
  
^  
--max-replica-count = 
 MAX_REPLICA_COUNT 
  
^  
 --traffic-split = 
 0 
 = 
 20 
, OLD_DEPLOYED_MODEL_ID 
 = 
 80 

REST

You use the endpoints.predict method to request an online inference.

Deploy the model.

Before using any of the request data, make the following replacements:

  • LOCATION_ID : The region where you are using Vertex AI.
  • PROJECT_ID : .
  • ENDPOINT_ID : The ID for the endpoint.
  • MODEL_ID : The ID for the model to be deployed.
  • DEPLOYED_MODEL_NAME : A name for the DeployedModel . You can use the display name of the Model for the DeployedModel as well.
  • MACHINE_TYPE : Optional. The machine resources used for each node of this deployment. Its default setting is n1-standard-2 . Learn more about machine types.
  • ACCELERATOR_TYPE : The type of accelerator to be attached to the machine. Optional if ACCELERATOR_COUNT is not specified or is zero. Not recommended for AutoML models or custom-trained models that are using non-GPU images. Learn more .
  • ACCELERATOR_COUNT : The number of accelerators for each replica to use. Optional. Should be zero or unspecified for AutoML models or custom-trained models that are using non-GPU images.
  • MIN_REPLICA_COUNT : The minimum number of nodes for this deployment. The node count can be increased or decreased as required by the inference load, up to the maximum number of nodes and never fewer than this number of nodes. This value must be greater than or equal to 1.
  • MAX_REPLICA_COUNT : The maximum number of nodes for this deployment. The node count can be increased or decreased as required by the inference load, up to this number of nodes and never fewer than the minimum number of nodes.
  • REQUIRED_REPLICA_COUNT : Optional. The required number of nodes for this deployment to be marked as successful. Must be greater than or equal to 1 and fewer than or equal to the minimum number of nodes. If not specified, the default value is the minimum number of nodes.
  • TRAFFIC_SPLIT_THIS_MODEL : The percentage of the prediction traffic to this endpoint to be routed to the model being deployed with this operation. Defaults to 100. All traffic percentages must add up to 100. Learn more about traffic splits .
  • DEPLOYED_MODEL_ID_N : Optional. If other models are deployed to this endpoint, you must update their traffic split percentages so that all percentages add up to 100.
  • TRAFFIC_SPLIT_MODEL_N : The traffic split percentage value for the deployed model id key.
  • PROJECT_NUMBER : Your project's automatically generated project number

HTTP method and URL:

POST https:// LOCATION_ID 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION_ID 
/endpoints/ ENDPOINT_ID 
:deployModel

Request JSON body:

{
  "deployedModel": {
    "model": "projects/ PROJECT 
/locations/us-central1/models/ MODEL_ID 
",
    "displayName": " DEPLOYED_MODEL_NAME 
",
    "dedicatedResources": {
       "machineSpec": {
         "machineType": " MACHINE_TYPE 
",
         "acceleratorType": " ACCELERATOR_TYPE 
",
         "acceleratorCount": " ACCELERATOR_COUNT 
"
       },
       "minReplicaCount": MIN_REPLICA_COUNT 
,
       "maxReplicaCount": MAX_REPLICA_COUNT 
,
       "requiredReplicaCount": REQUIRED_REPLICA_COUNT 
},
  },
  "trafficSplit": {
    "0": TRAFFIC_SPLIT_THIS_MODEL 
,
    " DEPLOYED_MODEL_ID_1 
": TRAFFIC_SPLIT_MODEL_1 
,
    " DEPLOYED_MODEL_ID_2 
": TRAFFIC_SPLIT_MODEL_2 
},
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/ PROJECT_ID 
/locations/ LOCATION 
/endpoints/ ENDPOINT_ID 
/operations/ OPERATION_ID 
",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-10-19T17:53:16.502088Z",
      "updateTime": "2020-10-19T17:53:16.502088Z"
    }
  }
}

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Java API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  import 
  
 com.google.api.gax.longrunning. OperationFuture 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. DedicatedResources 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. DeployModelOperationMetadata 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. DeployModelResponse 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. DeployedModel 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. EndpointName 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. EndpointServiceClient 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. EndpointServiceSettings 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. MachineSpec 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. ModelName 
 
 ; 
 import 
  
 java.io.IOException 
 ; 
 import 
  
 java.util.HashMap 
 ; 
 import 
  
 java.util.Map 
 ; 
 import 
  
 java.util.concurrent.ExecutionException 
 ; 
 public 
  
 class 
 DeployModelCustomTrainedModelSample 
  
 { 
  
 public 
  
 static 
  
 void 
  
 main 
 ( 
 String 
 [] 
  
 args 
 ) 
  
 throws 
  
 IOException 
 , 
  
 ExecutionException 
 , 
  
 InterruptedException 
  
 { 
  
 // TODO(developer): Replace these variables before running the sample. 
  
 String 
  
 project 
  
 = 
  
 "PROJECT" 
 ; 
  
 String 
  
 endpointId 
  
 = 
  
 "ENDPOINT_ID" 
 ; 
  
 String 
  
 modelName 
  
 = 
  
 "MODEL_NAME" 
 ; 
  
 String 
  
 deployedModelDisplayName 
  
 = 
  
 "DEPLOYED_MODEL_DISPLAY_NAME" 
 ; 
  
 deployModelCustomTrainedModelSample 
 ( 
 project 
 , 
  
 endpointId 
 , 
  
 modelName 
 , 
  
 deployedModelDisplayName 
 ); 
  
 } 
  
 static 
  
 void 
  
 deployModelCustomTrainedModelSample 
 ( 
  
 String 
  
 project 
 , 
  
 String 
  
 endpointId 
 , 
  
 String 
  
 model 
 , 
  
 String 
  
 deployedModelDisplayName 
 ) 
  
 throws 
  
 IOException 
 , 
  
 ExecutionException 
 , 
  
 InterruptedException 
  
 { 
  
  EndpointServiceSettings 
 
  
 settings 
  
 = 
  
  EndpointServiceSettings 
 
 . 
 newBuilder 
 () 
  
 . 
 setEndpoint 
 ( 
 "us-central1-aiplatform.googleapis.com:443" 
 ) 
  
 . 
 build 
 (); 
  
 String 
  
 location 
  
 = 
  
 "us-central1" 
 ; 
  
 // Initialize client that will be used to send requests. This client only needs to be created 
  
 // once, and can be reused for multiple requests. After completing all of your requests, call 
  
 // the "close" method on the client to safely clean up any remaining background resources. 
  
 try 
  
 ( 
  EndpointServiceClient 
 
  
 client 
  
 = 
  
  EndpointServiceClient 
 
 . 
 create 
 ( 
 settings 
 )) 
  
 { 
  
  MachineSpec 
 
  
 machineSpec 
  
 = 
  
  MachineSpec 
 
 . 
 newBuilder 
 (). 
  setMachineType 
 
 ( 
 "n1-standard-2" 
 ). 
 build 
 (); 
  
  DedicatedResources 
 
  
 dedicatedResources 
  
 = 
  
  DedicatedResources 
 
 . 
 newBuilder 
 (). 
 setMinReplicaCount 
 ( 
 1 
 ). 
 setMachineSpec 
 ( 
 machineSpec 
 ). 
 build 
 (); 
  
 String 
  
 modelName 
  
 = 
  
  ModelName 
 
 . 
 of 
 ( 
 project 
 , 
  
 location 
 , 
  
 model 
 ). 
 toString 
 (); 
  
  DeployedModel 
 
  
 deployedModel 
  
 = 
  
  DeployedModel 
 
 . 
 newBuilder 
 () 
  
 . 
 setModel 
 ( 
 modelName 
 ) 
  
 . 
 setDisplayName 
 ( 
 deployedModelDisplayName 
 ) 
  
 // `dedicated_resources` must be used for non-AutoML models 
  
 . 
 setDedicatedResources 
 ( 
 dedicatedResources 
 ) 
  
 . 
 build 
 (); 
  
 // key '0' assigns traffic for the newly deployed model 
  
 // Traffic percentage values must add up to 100 
  
 // Leave dictionary empty if endpoint should not accept any traffic 
  
 Map<String 
 , 
  
 Integer 
>  
 trafficSplit 
  
 = 
  
 new 
  
 HashMap 
<> (); 
  
 trafficSplit 
 . 
 put 
 ( 
 "0" 
 , 
  
 100 
 ); 
  
  EndpointName 
 
  
 endpoint 
  
 = 
  
  EndpointName 
 
 . 
 of 
 ( 
 project 
 , 
  
 location 
 , 
  
 endpointId 
 ); 
  
 OperationFuture<DeployModelResponse 
 , 
  
 DeployModelOperationMetadata 
>  
 response 
  
 = 
  
 client 
 . 
  deployModelAsync 
 
 ( 
 endpoint 
 , 
  
 deployedModel 
 , 
  
 trafficSplit 
 ); 
  
 // You can use OperationFuture.getInitialFuture to get a future representing the initial 
  
 // response to the request, which contains information while the operation is in progress. 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "Operation name: %s\n" 
 , 
  
 response 
 . 
 getInitialFuture 
 (). 
  get 
 
 (). 
 getName 
 ()); 
  
 // OperationFuture.get() will block until the operation is finished. 
  
  DeployModelResponse 
 
  
 deployModelResponse 
  
 = 
  
 response 
 . 
  get 
 
 (); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "deployModelResponse: %s\n" 
 , 
  
 deployModelResponse 
 ); 
  
 } 
  
 } 
 } 
 

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python . For more information, see the Python API reference documentation .

  def 
  
 deploy_model_with_dedicated_resources_sample 
 ( 
  
 project 
 , 
  
 location 
 , 
  
 model_name 
 : 
  
 str 
 , 
  
 machine_type 
 : 
  
 str 
 , 
  
 endpoint 
 : 
  
 Optional 
 [ 
 aiplatform.Endpoint 
 ] 
  
 = 
  
 None 
 , 
  
 deployed_model_display_name 
 : 
  
 Optional 
 [ 
 str 
 ] 
  
 = 
  
 None 
 , 
  
 traffic_percentage 
 : 
  
 Optional 
 [ 
 int 
 ] 
  
 = 
  
 0 
 , 
  
 traffic_split 
 : 
  
 Optional 
 [ 
 Dict[str, int 
 ] 
 ] 
  
 = 
  
 None 
 , 
  
 min_replica_count 
 : 
  
 int 
  
 = 
  
 1 
 , 
  
 max_replica_count 
 : 
  
 int 
  
 = 
  
 1 
 , 
  
 accelerator_type 
 : 
  
 Optional 
 [ 
 str 
 ] 
  
 = 
  
 None 
 , 
  
 accelerator_count 
 : 
  
 Optional 
 [ 
 int 
 ] 
  
 = 
  
 None 
 , 
  
 explanation_metadata 
 : 
  
 Optional 
 [ 
 explain.ExplanationMetadata 
 ] 
  
 = 
  
 None 
 , 
  
 explanation_parameters 
 : 
  
 Optional 
 [ 
 explain.ExplanationParameters 
 ] 
  
 = 
  
 None 
 , 
  
 metadata 
 : 
  
 Optional 
 [ 
 Sequence[Tuple[str, str 
 ] 
 ]] 
  
 = 
  
 (), 
  
 sync 
 : 
  
 bool 
  
 = 
  
 True 
 , 
 ) 
 : 
  
 """ 
 model_name: A fully-qualified model resource name or model ID. 
 Example: " 
 projects 
 / 
 123 
 / 
 locations 
 / 
 us 
 - 
 central1 
 / 
 models 
 / 
 456 
 " or 
 " 
 456 
 " when project and location are initialized or passed. 
 """ 
  
 aiplatform 
 . 
 init 
 ( 
 project 
 = 
 project 
 , 
  
 location 
 = 
 location 
 ) 
  
 model 
  
 = 
  
 aiplatform 
 . 
 Model 
 ( 
 model_name 
 = 
 model_name 
 ) 
  
 # 
  
 The 
  
 explanation_metadata 
  
 and 
  
 explanation_parameters 
  
 should 
  
 only 
  
 be 
  
 # 
  
 provided 
  
 for 
  
 a 
  
 custom 
  
 trained 
  
 model 
  
 and 
  
 not 
  
 an 
  
 AutoML 
  
 model 
 . 
  
 model 
 . 
 deploy 
 ( 
  
 endpoint 
 = 
 endpoint 
 , 
  
 deployed_model_display_name 
 = 
 deployed_model_display_name 
 , 
  
 traffic_percentage 
 = 
 traffic_percentage 
 , 
  
 traffic_split 
 = 
 traffic_split 
 , 
  
 machine_type 
 = 
 machine_type 
 , 
  
 min_replica_count 
 = 
 min_replica_count 
 , 
  
 max_replica_count 
 = 
 max_replica_count 
 , 
  
 accelerator_type 
 = 
 accelerator_type 
 , 
  
 accelerator_count 
 = 
 accelerator_count 
 , 
  
 explanation_metadata 
 = 
 explanation_metadata 
 , 
  
 explanation_parameters 
 = 
 explanation_parameters 
 , 
  
 metadata 
 = 
 metadata 
 , 
  
 sync 
 = 
 sync 
 , 
  
 ) 
  
 model 
 . 
 wait 
 () 
  
 print 
 ( 
 model 
 . 
 display_name 
 ) 
  
 print 
 ( 
 model 
 . 
 resource_name 
 ) 
  
 return 
  
 model 
 

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Node.js API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  const 
  
 automl 
  
 = 
  
 require 
 ( 
 ' @google-cloud/automl 
' 
 ); 
 const 
  
 client 
  
 = 
  
 new 
  
 automl 
 . 
 v1beta1 
 . 
  AutoMlClient 
 
 (); 
 /** 
 * Demonstrates using the AutoML client to create a model. 
 * TODO(developer): Uncomment the following lines before running the sample. 
 */ 
 // const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project"; 
 // const computeRegion = '[REGION_NAME]' e.g., "us-central1"; 
 // const datasetId = '[DATASET_ID]' e.g., "TBL2246891593778855936"; 
 // const tableId = '[TABLE_ID]' e.g., "1991013247762825216"; 
 // const columnId = '[COLUMN_ID]' e.g., "773141392279994368"; 
 // const modelName = '[MODEL_NAME]' e.g., "testModel"; 
 // const trainBudget = '[TRAIN_BUDGET]' e.g., "1000", 
 // `Train budget in milli node hours`; 
 // A resource that represents Google Cloud Platform location. 
 const 
  
 projectLocation 
  
 = 
  
 client 
 . 
 locationPath 
 ( 
 projectId 
 , 
  
 computeRegion 
 ); 
 // Get the full path of the column. 
 const 
  
 columnSpecId 
  
 = 
  
 client 
 . 
 columnSpecPath 
 ( 
  
 projectId 
 , 
  
 computeRegion 
 , 
  
 datasetId 
 , 
  
 tableId 
 , 
  
 columnId 
 ); 
 // Set target column to train the model. 
 const 
  
 targetColumnSpec 
  
 = 
  
 { 
 name 
 : 
  
 columnSpecId 
 }; 
 // Set tables model metadata. 
 const 
  
 tablesModelMetadata 
  
 = 
  
 { 
  
 targetColumnSpec 
 : 
  
 targetColumnSpec 
 , 
  
 trainBudgetMilliNodeHours 
 : 
  
 trainBudget 
 , 
 }; 
 // Set datasetId, model name and model metadata for the dataset. 
 const 
  
 myModel 
  
 = 
  
 { 
  
 datasetId 
 : 
  
 datasetId 
 , 
  
 displayName 
 : 
  
 modelName 
 , 
  
 tablesModelMetadata 
 : 
  
 tablesModelMetadata 
 , 
 }; 
 // Create a model with the model metadata in the region. 
 client 
  
 . 
 createModel 
 ({ 
 parent 
 : 
  
 projectLocation 
 , 
  
 model 
 : 
  
 myModel 
 }) 
  
 . 
 then 
 ( 
 responses 
  
 = 
>  
 { 
  
 const 
  
 initialApiResponse 
  
 = 
  
 responses 
 [ 
 1 
 ]; 
  
 console 
 . 
 log 
 ( 
 `Training operation name: 
 ${ 
 initialApiResponse 
 . 
 name 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 'Training started...' 
 ); 
  
 }) 
  
 . 
 catch 
 ( 
 err 
  
 = 
>  
 { 
  
 console 
 . 
 error 
 ( 
 err 
 ); 
  
 }); 
 

Learn how to change the default settings for inference logging .

Get operation status

Some requests start long-running operations that require time to complete. These requests return an operation name, which you can use to view the operation's status or cancel the operation. Vertex AI provides helper methods to make calls against long-running operations. For more information, see Working with long-running operations .

Get an online inference using your deployed model

To make an online inference, submit one or more test items to a model for analysis, and the model returns results that are based on your model's objective. Use the Google Cloud console or the Vertex AI API to request an online inference.

Google Cloud console

  1. In the Google Cloud console, in the Vertex AI section, go to the Modelspage.

    Go to the Models page

  2. From the list of models, click the name of the model to request inferences from.

  3. Select the Deploy & testtab.

  4. Under the Test your modelsection, add test items to request an inference. The baseline inference data is filled in for you, or you can enter your own inference data and click Predict.

    After the inference is complete, Vertex AI returns the results in the console.

API: Classification

gcloud

  1. Create a file named request.json with the following contents:

    {
      "instances": [
        { PREDICTION_DATA_ROW 
    }
      ]
    }

    Replace the following:

    • PREDICTION_DATA_ROW : A JSON object with keys as the feature names and values as the corresponding feature values. For example, for a dataset with a number, an array of strings, and a category, the row of data might look like the following example request:

      "length":3.6,
      "material":"cotton",
      "tag_array": ["abc","def"]

      A value must be provided for every feature included in training. The format of the data used for prediction must match the format used for training. Refer to Data format for predictions for details.

  2. Run the following command:

    gcloud  
    ai  
    endpoints  
    predict  
     ENDPOINT_ID 
      
     \ 
      
    --region = 
     LOCATION_ID 
      
     \ 
      
    --json-request = 
    request.json

    Replace the following:

    • ENDPOINT_ID : The ID for the endpoint.
    • LOCATION_ID : The region where you are using Vertex AI.

REST

You use the endpoints.predict method to request an online inference.

Before using any of the request data, make the following replacements:

  • LOCATION_ID : Region where Endpoint is located. For example, us-central1 .
  • PROJECT_ID : Your project ID .
  • ENDPOINT_ID : The ID for the endpoint.
  • PREDICTION_DATA_ROW : A JSON object with keys as the feature names and values as the corresponding feature values. For example, for a dataset with a number, an array of strings, and a category, the row of data might look like the following example request:

    "length":3.6,
    "material":"cotton",
    "tag_array": ["abc","def"]

    A value must be provided for every feature included in training. The format of the data used for prediction must match the format used for training. Refer to Data format for predictions for details.

  • DEPLOYED_MODEL_ID : Output by the predict method. The ID of the model used to generate the inference.

HTTP method and URL:

POST https:// LOCATION_ID 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION_ID 
/endpoints/ ENDPOINT_ID 
:predict

Request JSON body:

{
  "instances": [
    { PREDICTION_DATA_ROW 
}
  ]
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json , and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https:// LOCATION_ID -aiplatform.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION_ID /endpoints/ ENDPOINT_ID :predict"

PowerShell

Save the request body in a file named request.json , and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https:// LOCATION_ID -aiplatform.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION_ID /endpoints/ ENDPOINT_ID :predict" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
     "predictions": [
      {
         "scores": [
           0.96771615743637085,
           0.032283786684274673
         ],
         "classes": [
           "0",
           "1"
         ]
      }
     ]
     "deployedModelId": "2429510197"
   }

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Java API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  import 
  
 com.google.cloud.aiplatform.util. ValueConverter 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. EndpointName 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. PredictResponse 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. PredictionServiceClient 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. PredictionServiceSettings 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1.schema.predict.prediction. TabularClassificationPredictionResult 
 
 ; 
 import 
  
 com.google.protobuf. ListValue 
 
 ; 
 import 
  
 com.google.protobuf. Value 
 
 ; 
 import 
  
 com.google.protobuf.util. JsonFormat 
 
 ; 
 import 
  
 java.io.IOException 
 ; 
 import 
  
 java.util.List 
 ; 
 public 
  
 class 
 PredictTabularClassificationSample 
  
 { 
  
 public 
  
 static 
  
 void 
  
 main 
 ( 
 String 
 [] 
  
 args 
 ) 
  
 throws 
  
 IOException 
  
 { 
  
 // TODO(developer): Replace these variables before running the sample. 
  
 String 
  
 project 
  
 = 
  
 "YOUR_PROJECT_ID" 
 ; 
  
 String 
  
 instance 
  
 = 
  
 "[{ “feature_column_a”: “value”, “feature_column_b”: “value”}]" 
 ; 
  
 String 
  
 endpointId 
  
 = 
  
 "YOUR_ENDPOINT_ID" 
 ; 
  
 predictTabularClassification 
 ( 
 instance 
 , 
  
 project 
 , 
  
 endpointId 
 ); 
  
 } 
  
 static 
  
 void 
  
 predictTabularClassification 
 ( 
 String 
  
 instance 
 , 
  
 String 
  
 project 
 , 
  
 String 
  
 endpointId 
 ) 
  
 throws 
  
 IOException 
  
 { 
  
 PredictionServic PredictionServiceSettings 
ceSettings 
  
 = 
  
 PredictionServic PredictionServiceSettings 
 
  
 . 
 setEndpoint 
 ( 
 "us-central1-aiplatform.googleapis.com:443" 
 ) 
  
 . 
 build 
 (); 
  
 // Initialize client that will be used to send requests. This client only needs to be created 
  
 // once, and can be reused for multiple requests. After completing all of your requests, call 
  
 // the "close" method on the client to safely clean up any remaining background resources. 
  
 try 
  
 ( 
 PredictionServic PredictionServiceClient 
ceClient 
  
 = 
  
 PredictionServic PredictionServiceClient 
onServiceSettings 
 )) 
  
 { 
  
 String 
  
 location 
  
 = 
  
 "us-central1" 
 ; 
  
 EndpointName 
  
 end EndpointName 
EndpointName 
 . 
 of 
 ( 
  EndpointName 
 
ation , 
  
 endpointId 
 ); 
  
 ListValue 
 . 
 Builde ListValue 
ue 
  
 = 
  
 ListValue 
 . 
 newBui ListValue 
 
  
 JsonFormat 
 . 
 parse JsonFormat 
instance 
 , 
  
 listValue 
 ); 
  
 List<Value> 
  
 instan ListValue 
listValue 
 . 
 getValuesList 
 (); 
  
 Value 
  
 parameters Value 
lue 
 . 
 newBuilder Value 
tListValue 
 ( 
 listValue 
 ). 
 build 
 (); 
  
 PredictResponse 
  
  PredictResponse 
 
  
 = 
  
 predictionServiceClient 
 . 
 predict 
 ( 
 endpointName 
 , 
  
 instanceList 
 , 
  
 parameters 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Predict Tabular Classification Response" 
 ); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "\tDeployed Model Id: %s\n" 
 , 
  
 predictResponse 
 . 
 predictResponse 
 . 
  getDeployedModelId 
 
 (). 
 out 
 . 
 println 
 ( 
 "Predictions" 
 ); 
  
 for 
  
 ( 
 Value 
  
 prediction Value 
edictResponse 
 . 
 predictResponse 
 . 
  getPredictionsList 
 
 () 
 larClassific TabularClassificationPredictionResult 
uilder 
  
 = 
  
 TabularClassific TabularClassificationPredictionResult 
 
  
 TabularClassific TabularClassificationPredictionResult 
 
  
 ( 
 TabularClassific TabularClassificationPredictionResult 
 
  
 ValueConverter 
 . 
 f ValueConverter 
tBuilder 
 , 
  
 prediction 
 ); 
  
 for 
  
 ( 
 int 
  
 i 
  
 = 
  
 0 
 ; 
  
 i 
 < 
 result 
 . 
 getClasseresult 
 . 
  getClassesCount 
 
 () 
  
 System 
 . 
 out 
 . 
 printf 
 ( 
 "\tClass: %s" 
 , 
  
 result 
 . 
 getClasseresult 
 . 
  getClasses 
 
 ( 
 i 
 ) 
 tem 
 . 
 out 
 . 
 printf 
 ( 
 "\tScore: %f" 
 , 
  
 result 
 . 
 getScoresresult 
 . 
  getScores 
 
 ( 
 i 
 ) 
  
 } 
  
 } 
  
 } 
 } 
 

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Node.js API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  /** 
 * TODO(developer): Uncomment these variables before running the sample.\ 
 * (Not necessary if passing values as arguments) 
 */ 
 // const endpointId = 'YOUR_ENDPOINT_ID'; 
 // const project = 'YOUR_PROJECT_ID'; 
 // const location = 'YOUR_PROJECT_LOCATION'; 
 const 
  
 aiplatform 
  
 = 
  
 require 
 ( 
 ' @google-cloud/aiplatform 
' 
 ); 
 const 
  
 { 
 prediction 
 } 
  
 = 
  
 aiplatform 
 . 
 protos 
 . 
 google 
 . 
 cloud 
 . 
 aiplatform 
 . 
 v1 
 . 
 schema 
 . 
 predict 
 ; 
 // Imports the Google Cloud Prediction service client 
 const 
  
 { 
 PredictionServiceClient 
 } 
  
 = 
  
 aiplatform 
 . 
 v1 
 ; 
 // Import the helper module for converting arbitrary protobuf.Value objects. 
 const 
  
 { 
 helpers 
 } 
  
 = 
  
 aiplatform 
 ; 
 // Specifies the location of the api endpoint 
 const 
  
 clientOptions 
  
 = 
  
 { 
  
 apiEndpoint 
 : 
  
 'us-central1-aiplatform.googleapis.com' 
 , 
 }; 
 // Instantiates a client 
 const 
  
 predictionServiceClient 
  
 = 
  
 new 
  
  PredictionServiceClient 
 
 ( 
 clientOptions 
 ); 
 async 
  
 function 
  
 predictTablesClassification 
 () 
  
 { 
  
 // Configure the endpoint resource 
  
 const 
  
 endpoint 
  
 = 
  
 `projects/ 
 ${ 
 project 
 } 
 /locations/ 
 ${ 
 location 
 } 
 /endpoints/ 
 ${ 
 endpointId 
 } 
 ` 
 ; 
  
 const 
  
 parameters 
  
 = 
  
  helpers 
 
 . 
 toValue 
 ({}); 
  
 const 
  
 instance 
  
 = 
  
  helpers 
 
 . 
 toValue 
 ({ 
  
 petal_length 
 : 
  
 '1.4' 
 , 
  
 petal_width 
 : 
  
 '1.3' 
 , 
  
 sepal_length 
 : 
  
 '5.1' 
 , 
  
 sepal_width 
 : 
  
 '2.8' 
 , 
  
 }); 
  
 const 
  
 instances 
  
 = 
  
 [ 
 instance 
 ]; 
  
 const 
  
 request 
  
 = 
  
 { 
  
 endpoint 
 , 
  
 instances 
 , 
  
 parameters 
 , 
  
 }; 
  
 // Predict request 
  
 const 
  
 [ 
 response 
 ] 
  
 = 
  
 await 
  
 predictionServiceClient 
 . 
 predict 
 ( 
 request 
 ); 
  
 console 
 . 
 log 
 ( 
 'Predict tabular classification response' 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tDeployed model id : 
 ${ 
 response 
 . 
 deployedModelId 
 } 
 \n` 
 ); 
  
 const 
  
 predictions 
  
 = 
  
 response 
 . 
 predictions 
 ; 
  
 console 
 . 
 log 
 ( 
 'Predictions :' 
 ); 
  
 for 
  
 ( 
 const 
  
 predictionResultVal 
  
 of 
  
 predictions 
 ) 
  
 { 
  
 const 
  
 predictionResultObj 
  
 = 
  
 prediction 
 . 
 TabularClassificationPredictionResult 
 . 
 fromValue 
 ( 
  
 predictionResultVal 
  
 ); 
  
 for 
  
 ( 
 const 
  
 [ 
 i 
 , 
  
 class_ 
 ] 
  
 of 
  
 predictionResultObj 
 . 
 classes 
 . 
 entries 
 ()) 
  
 { 
  
 console 
 . 
 log 
 ( 
 `\tClass: 
 ${ 
 class_ 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tScore: 
 ${ 
 predictionResultObj 
 . 
 scores 
 [ 
 i 
 ] 
 } 
 \n\n` 
 ); 
  
 } 
  
 } 
 } 
 predictTablesClassification 
 (); 
 

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python . For more information, see the Python API reference documentation .

  def 
  
 predict_tabular_classification_sample 
 ( 
  
 project 
 : 
  
 str 
 , 
  
 location 
 : 
  
 str 
 , 
  
 endpoint_name 
 : 
  
 str 
 , 
  
 instances 
 : 
  
 List 
 [ 
 Dict 
 ] 
 , 
 ) 
 : 
  
 """ 
 Args 
 project: Your project ID or project number. 
 location: Region where Endpoint is located. For example, 'us-central1'. 
 endpoint_name: A fully qualified endpoint name or endpoint ID. Example: " 
 projects 
 / 
 123 
 / 
 locations 
 / 
 us 
 - 
 central1 
 / 
 endpoints 
 / 
 456 
 " or 
 " 
 456 
 " when project and location are initialized or passed. 
 instances: A list of one or more instances (examples) to return a prediction for. 
 """ 
  
 aiplatform 
 . 
 init 
 ( 
 project 
 = 
 project 
 , 
  
 location 
 = 
 location 
 ) 
  
 endpoint 
  
 = 
  
 aiplatform 
 . 
 Endpoint 
 ( 
 endpoint_name 
 ) 
  
 response 
  
 = 
  
 endpoint 
 . 
 predict 
 ( 
 instances 
 = 
 instances 
 ) 
  
 for 
  
 prediction_ 
  
 in 
  
 response 
 . 
 predictions 
 : 
  
 print 
 ( 
 prediction_ 
 ) 
 

API: Regression

gcloud

  1. Create a file named `request.json` with the following contents:

    {
      "instances": [
        { PREDICTION_DATA_ROW 
    }
      ]
    }

    Replace the following:

    • PREDICTION_DATA_ROW : A JSON object with keys as the feature names and values as the corresponding feature values. For example, for a dataset with a number, an array of numbers, and a category, the row of data might look like the following example request:

      "age":3.6,
      "sq_ft":5392,
      "code": "90331"

      A value must be provided for every feature included in training. The format of the data used for prediction must match the format used for training. Refer to Data format for predictions for details.

  2. Run the following command:

    gcloud  
    ai  
    endpoints  
    predict  
     ENDPOINT_ID 
      
     \ 
      
    --region = 
     LOCATION_ID 
      
     \ 
      
    --json-request = 
    request.json

    Replace the following:

    • ENDPOINT_ID : The ID for the endpoint.
    • LOCATION_ID : The region where you are using Vertex AI.

REST

You use the endpoints.predict method to request an online inference.

Before using any of the request data, make the following replacements:

  • LOCATION_ID : Region where Endpoint is located. For example, us-central1 .
  • PROJECT_ID : .
  • ENDPOINT_ID : The ID for the endpoint.
  • PREDICTION_DATA_ROW : A JSON object with keys as the feature names and values as the corresponding feature values. For example, for a dataset with a number, an array of numbers, and a category, the row of data might look like the following example request:

    "age":3.6,
    "sq_ft":5392,
    "code": "90331"

    A value must be provided for every feature included in training. The format of the data used for prediction must match the format used for training. Refer to Data format for predictions for details.

  • DEPLOYED_MODEL_ID : Output by the predict method. The ID of the model used to generate the inference.

HTTP method and URL:

POST https:// LOCATION_ID 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION_ID 
/endpoints/ ENDPOINT_ID 
:predict

Request JSON body:

{
  "instances": [
    { PREDICTION_DATA_ROW 
}
  ]
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json , and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https:// LOCATION_ID -aiplatform.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION_ID /endpoints/ ENDPOINT_ID :predict"

PowerShell

Save the request body in a file named request.json , and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https:// LOCATION_ID -aiplatform.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION_ID /endpoints/ ENDPOINT_ID :predict" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "predictions": [
    [
      {
        "value": 65.14233
      }
    ]
  ],
  "deployedModelId": " DEPLOYED_MODEL_ID 
"
}

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Java API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  import 
  
 com.google.cloud.aiplatform.util. ValueConverter 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. EndpointName 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. PredictResponse 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. PredictionServiceClient 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1. PredictionServiceSettings 
 
 ; 
 import 
  
 com.google.cloud.aiplatform.v1.schema.predict.prediction. TabularRegressionPredictionResult 
 
 ; 
 import 
  
 com.google.protobuf. ListValue 
 
 ; 
 import 
  
 com.google.protobuf. Value 
 
 ; 
 import 
  
 com.google.protobuf.util. JsonFormat 
 
 ; 
 import 
  
 java.io.IOException 
 ; 
 import 
  
 java.util.List 
 ; 
 public 
  
 class 
 PredictTabularRegressionSample 
  
 { 
  
 public 
  
 static 
  
 void 
  
 main 
 ( 
 String 
 [] 
  
 args 
 ) 
  
 throws 
  
 IOException 
  
 { 
  
 // TODO(developer): Replace these variables before running the sample. 
  
 String 
  
 project 
  
 = 
  
 "YOUR_PROJECT_ID" 
 ; 
  
 String 
  
 instance 
  
 = 
  
 "[{ “feature_column_a”: “value”, “feature_column_b”: “value”}]" 
 ; 
  
 String 
  
 endpointId 
  
 = 
  
 "YOUR_ENDPOINT_ID" 
 ; 
  
 predictTabularRegression 
 ( 
 instance 
 , 
  
 project 
 , 
  
 endpointId 
 ); 
  
 } 
  
 static 
  
 void 
  
 predictTabularRegression 
 ( 
 String 
  
 instance 
 , 
  
 String 
  
 project 
 , 
  
 String 
  
 endpointId 
 ) 
  
 throws 
  
 IOException 
  
 { 
  
 PredictionServic PredictionServiceSettings 
ceSettings 
  
 = 
  
 PredictionServic PredictionServiceSettings 
 
  
 . 
 setEndpoint 
 ( 
 "us-central1-aiplatform.googleapis.com:443" 
 ) 
  
 . 
 build 
 (); 
  
 // Initialize client that will be used to send requests. This client only needs to be created 
  
 // once, and can be reused for multiple requests. After completing all of your requests, call 
  
 // the "close" method on the client to safely clean up any remaining background resources. 
  
 try 
  
 ( 
 PredictionServic PredictionServiceClient 
ceClient 
  
 = 
  
 PredictionServic PredictionServiceClient 
onServiceSettings 
 )) 
  
 { 
  
 String 
  
 location 
  
 = 
  
 "us-central1" 
 ; 
  
 EndpointName 
  
 end EndpointName 
EndpointName 
 . 
 of 
 ( 
  EndpointName 
 
ation , 
  
 endpointId 
 ); 
  
 ListValue 
 . 
 Builde ListValue 
ue 
  
 = 
  
 ListValue 
 . 
 newBui ListValue 
 
  
 JsonFormat 
 . 
 parse JsonFormat 
instance 
 , 
  
 listValue 
 ); 
  
 List<Value> 
  
 instan ListValue 
listValue 
 . 
 getValuesList 
 (); 
  
 Value 
  
 parameters Value 
lue 
 . 
 newBuilder Value 
tListValue 
 ( 
 listValue 
 ). 
 build 
 (); 
  
 PredictResponse 
  
  PredictResponse 
 
  
 = 
  
 predictionServiceClient 
 . 
 predict 
 ( 
 endpointName 
 , 
  
 instanceList 
 , 
  
 parameters 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Predict Tabular Regression Response" 
 ); 
  
 System 
 . 
 out 
 . 
 format 
 ( 
 "\tDisplay Model Id: %s\n" 
 , 
  
 predictResponse 
 . 
 predictResponse 
 . 
  getDeployedModelId 
 
 (). 
 out 
 . 
 println 
 ( 
 "Predictions" 
 ); 
  
 for 
  
 ( 
 Value 
  
 prediction Value 
edictResponse 
 . 
 predictResponse 
 . 
  getPredictionsList 
 
 () 
 larRegressio TabularRegressionPredictionResult 
uilder 
  
 = 
  
 TabularRegressio TabularRegressionPredictionResult 
 
  
 TabularRegressio TabularRegressionPredictionResult 
 
  
 ( 
 TabularRegressio TabularRegressionPredictionResult 
 
 . 
 f ValueConverter 
tBuilder 
 , 
  
 prediction 
 ); 
  
 System 
 . 
 out 
 . 
 printf 
 ( 
 "\tUpper bound: %f\n" 
 , 
  
 result 
 . 
 getUpperBresult 
 . 
  getUpperBound 
 
 () 
 m 
 . 
 out 
 . 
 printf 
 ( 
 "\tLower bound: %f\n" 
 , 
  
 result 
 . 
 getLowerBresult 
 . 
  getLowerBound 
 
 () 
 m 
 . 
 out 
 . 
 printf 
 ( 
 "\tValue: %f\n" 
 , 
  
 result 
 . 
 getValue 
 ( 
 result 
 . 
  getValue 
 
 () 
  
 } 
 } 
 

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Node.js API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  /** 
 * TODO(developer): Uncomment these variables before running the sample.\ 
 * (Not necessary if passing values as arguments) 
 */ 
 // const endpointId = 'YOUR_ENDPOINT_ID'; 
 // const project = 'YOUR_PROJECT_ID'; 
 // const location = 'YOUR_PROJECT_LOCATION'; 
 const 
  
 aiplatform 
  
 = 
  
 require 
 ( 
 ' @google-cloud/aiplatform 
' 
 ); 
 const 
  
 { 
 prediction 
 } 
  
 = 
  
 aiplatform 
 . 
 protos 
 . 
 google 
 . 
 cloud 
 . 
 aiplatform 
 . 
 v1 
 . 
 schema 
 . 
 predict 
 ; 
 // Imports the Google Cloud Prediction service client 
 const 
  
 { 
 PredictionServiceClient 
 } 
  
 = 
  
 aiplatform 
 . 
 v1 
 ; 
 // Import the helper module for converting arbitrary protobuf.Value objects. 
 const 
  
 { 
 helpers 
 } 
  
 = 
  
 aiplatform 
 ; 
 // Specifies the location of the api endpoint 
 const 
  
 clientOptions 
  
 = 
  
 { 
  
 apiEndpoint 
 : 
  
 'us-central1-aiplatform.googleapis.com' 
 , 
 }; 
 // Instantiates a client 
 const 
  
 predictionServiceClient 
  
 = 
  
 new 
  
  PredictionServiceClient 
 
 ( 
 clientOptions 
 ); 
 async 
  
 function 
  
 predictTablesRegression 
 () 
  
 { 
  
 // Configure the endpoint resource 
  
 const 
  
 endpoint 
  
 = 
  
 `projects/ 
 ${ 
 project 
 } 
 /locations/ 
 ${ 
 location 
 } 
 /endpoints/ 
 ${ 
 endpointId 
 } 
 ` 
 ; 
  
 const 
  
 parameters 
  
 = 
  
  helpers 
 
 . 
 toValue 
 ({}); 
  
 // TODO (erschmid): Make this less painful 
  
 const 
  
 instance 
  
 = 
  
  helpers 
 
 . 
 toValue 
 ({ 
  
 BOOLEAN_2unique_NULLABLE 
 : 
  
 false 
 , 
  
 DATETIME_1unique_NULLABLE 
 : 
  
 '2019-01-01 00:00:00' 
 , 
  
 DATE_1unique_NULLABLE 
 : 
  
 '2019-01-01' 
 , 
  
 FLOAT_5000unique_NULLABLE 
 : 
  
 1611 
 , 
  
 FLOAT_5000unique_REPEATED 
 : 
  
 [ 
 2320 
 , 
  
 1192 
 ], 
  
 INTEGER_5000unique_NULLABLE 
 : 
  
 '8' 
 , 
  
 NUMERIC_5000unique_NULLABLE 
 : 
  
 16 
 , 
  
 STRING_5000unique_NULLABLE 
 : 
  
 'str-2' 
 , 
  
 STRUCT_NULLABLE 
 : 
  
 { 
  
 BOOLEAN_2unique_NULLABLE 
 : 
  
 false 
 , 
  
 DATE_1unique_NULLABLE 
 : 
  
 '2019-01-01' 
 , 
  
 DATETIME_1unique_NULLABLE 
 : 
  
 '2019-01-01 00:00:00' 
 , 
  
 FLOAT_5000unique_NULLABLE 
 : 
  
 1308 
 , 
  
 FLOAT_5000unique_REPEATED 
 : 
  
 [ 
 2323 
 , 
  
 1178 
 ], 
  
 FLOAT_5000unique_REQUIRED 
 : 
  
 3089 
 , 
  
 INTEGER_5000unique_NULLABLE 
 : 
  
 '1777' 
 , 
  
 NUMERIC_5000unique_NULLABLE 
 : 
  
 3323 
 , 
  
 TIME_1unique_NULLABLE 
 : 
  
 '23:59:59.999999' 
 , 
  
 STRING_5000unique_NULLABLE 
 : 
  
 'str-49' 
 , 
  
 TIMESTAMP_1unique_NULLABLE 
 : 
  
 '1546387199999999' 
 , 
  
 }, 
  
 TIMESTAMP_1unique_NULLABLE 
 : 
  
 '1546387199999999' 
 , 
  
 TIME_1unique_NULLABLE 
 : 
  
 '23:59:59.999999' 
 , 
  
 }); 
  
 const 
  
 instances 
  
 = 
  
 [ 
 instance 
 ]; 
  
 const 
  
 request 
  
 = 
  
 { 
  
 endpoint 
 , 
  
 instances 
 , 
  
 parameters 
 , 
  
 }; 
  
 // Predict request 
  
 const 
  
 [ 
 response 
 ] 
  
 = 
  
 await 
  
 predictionServiceClient 
 . 
 predict 
 ( 
 request 
 ); 
  
 console 
 . 
 log 
 ( 
 'Predict tabular regression response' 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tDeployed model id : 
 ${ 
 response 
 . 
 deployedModelId 
 } 
 ` 
 ); 
  
 const 
  
 predictions 
  
 = 
  
 response 
 . 
 predictions 
 ; 
  
 console 
 . 
 log 
 ( 
 '\tPredictions :' 
 ); 
  
 for 
  
 ( 
 const 
  
 predictionResultVal 
  
 of 
  
 predictions 
 ) 
  
 { 
  
 const 
  
 predictionResultObj 
  
 = 
  
 prediction 
 . 
 TabularRegressionPredictionResult 
 . 
 fromValue 
 ( 
  
 predictionResultVal 
  
 ); 
  
 console 
 . 
 log 
 ( 
 `\tUpper bound: 
 ${ 
 predictionResultObj 
 . 
 upper_bound 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tLower bound: 
 ${ 
 predictionResultObj 
 . 
 lower_bound 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 `\tLower bound: 
 ${ 
 predictionResultObj 
 . 
 value 
 } 
 ` 
 ); 
  
 } 
 } 
 predictTablesRegression 
 (); 
 

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python . For more information, see the Python API reference documentation .

  def 
  
 predict_tabular_regression_sample 
 ( 
  
 project 
 : 
  
 str 
 , 
  
 location 
 : 
  
 str 
 , 
  
 endpoint_name 
 : 
  
 str 
 , 
  
 instances 
 : 
  
 List 
 [ 
 Dict 
 ] 
 , 
 ) 
 : 
  
 aiplatform 
 . 
 init 
 ( 
 project 
 = 
 project 
 , 
  
 location 
 = 
 location 
 ) 
  
 endpoint 
  
 = 
  
 aiplatform 
 . 
 Endpoint 
 ( 
 endpoint_name 
 ) 
  
 response 
  
 = 
  
 endpoint 
 . 
 predict 
 ( 
 instances 
 = 
 instances 
 ) 
  
 for 
  
 prediction_ 
  
 in 
  
 response 
 . 
 predictions 
 : 
  
 print 
 ( 
 prediction_ 
 ) 
 

Interpret prediction results

Classification

Classification models return a confidence score.

The confidence score communicates how strongly your model associates each class or label with a test item. The higher the number, the higher the model's confidence that the label should be applied to that item. You decide how high the confidence score must be for you to accept the model's results.

Regression

Regression models return an inference value.

If your model uses probabilistic inference, the value field contains the minimizer of the optimization objective. For example, if your optimization objective is minimize-rmse , the value field contains the mean value. If it is minimize-mae , the value field contains the median value.

If your model uses probabilistic inference with quantiles, Vertex AI provides quantile values and inferences in addition to the minimizer of the optimization objective. Quantile values are set during model training. Quantile inferences are the inference values associated with the quantile values.

TabNet provides inherent model interpretability by giving users insight into which features it used to help make its decision. The algorithm utilizes attention , which learns to selectively enhance the influence of some features while diminishing the influence of others through a weighted average. For a particular decision, TabNet decides in a stepwise fashion how much importance to place on each feature. It then combines each of the steps to create a final prediction. The attention is multiplicative, where larger values indicate that the feature played a larger role in the prediction and a value of zero means that the feature played no role in that decision. Because TabNet uses multiple decision steps, the attention placed on the features across all of the steps are linearly combined after appropriate scaling. This linear combination across all of TabNet's decision steps is the total feature importance that TabNet provides you.

Example output for inferences

The return payload for an online inference with feature importance from a regression model looks similar to the following example.

  { 
  
 "predictions" 
: [ 
  
 { 
  
 "value" 
:0.3723912537097931,  
 "feature_importance" 
: { 
  
 "MSSubClass" 
:0.12,  
 "MSZoning" 
:0.33,  
 "LotFrontage" 
:0.27,  
 "LotArea" 
:0.06,  
...  
 } 
  
 } 
  
 ] 
 } 
 

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: