Get an online prediction

The Online Prediction service of Vertex AI lets you make synchronous requests to your own prediction model endpoint.

This page shows you how to send requests to your model so that it can serve online predictions with low latency.

Before you begin

Before you can start using the Online Prediction API, you must have a project and appropriate credentials.

Follow these steps before getting an online prediction:

Set up a project for Vertex AI .
To get the permissions that you need to access Online Prediction, ask your Project IAM Admin to grant you the Vertex AI Prediction User ( vertex-ai-prediction-user ) role.

For information about this role, see Prepare IAM permissions .
Create and train a prediction model targeting one of the supported containers .
Create the prediction cluster and ensure your project allows incoming external traffic.
Export your model artifacts for prediction .
Deploy your model to an endpoint .
Show details of the Endpoint custom resource of your prediction model:
```
 kubectl  
--kubeconfig  
 PREDICTION_CLUSTER_KUBECONFIG 
  
get  
endpoint  
 PREDICTION_ENDPOINT 
  
-n  
 PROJECT_NAMESPACE 
  
-o  
 jsonpath 
 = 
 '{.status.endpointFQDN}' 
 
```
Replace the following:
- PREDICTION_CLUSTER_KUBECONFIG : the path to the kubeconfig file in the prediction cluster.
- PREDICTION_ENDPOINT : the name of the endpoint.
- PROJECT_NAMESPACE : the name of the prediction project namespace.
The output must show the status field, displaying the endpoint fully-qualified domain name on the endpointFQDN field. Register this endpoint URL path to use it for your requests.

Set your environment variables

If you want to send a request to your model endpoint using a Python script and you set up a service account in your project to make authorized API calls programmatically, you can define environment variables in the script to access values such as the service account keys when running.

Follow these steps to set required environment variables on a Python script:

Create a JupyterLab notebook to interact with the Online Prediction API.
Create a Python script on the JupyterLab notebook.
Add the following code to the Python script:
```
  import 
  
 os 
 os 
 . 
 environ 
 [ 
 "GOOGLE_APPLICATION_CREDENTIALS" 
 ] 
 = 
 " APPLICATION_DEFAULT_CREDENTIALS_FILENAME 
" 
 
```
Replace APPLICATION_DEFAULT_CREDENTIALS_FILENAME with the name of the JSON file that contains the service account keys you created in the project, such as my-service-key.json .
Save the Python script with a name, such as prediction.py .
Run the Python script to set the environment variables:
```
  python 
  SCRIPT_NAME 
 
 
```
Replace SCRIPT_NAME with the name you gave to your Python script, such as prediction.py .

Send a request to an endpoint

Make a request to the model's endpoint to get an online prediction:

curl

Follow these steps to make a curl request:

Create a JSON file named request.json for your request body.

You must add and format your input for online prediction with the request body details that the target container requires.
Get an authentication token .

Make the request:

 curl  
-X  
POST  
-H  
 "Content-Type: application/json; charset=utf-8" 
  
-H  
 "Authorization: Bearer TOKEN 
" 
https:// ENDPOINT_HOSTNAME 
:443/v1/model:predict  
-d  
@request.json

Replace the following:

TOKEN : the authentication token you obtained.
ENDPOINT_HOSTNAME : your model endpoint FQDN for the online prediction request.

If successful, you receive a JSON response to your online prediction request.

The following output shows an example:

  { 
  
 "predictions" 
 : 
  
 [[ 
 -357.10849 
 ], 
  
 [ 
 -171.621658 
 ] 
  
 ] 
 }

For more information about responses, see Response body details .

Python

Follow these steps to use the Online Prediction service from a Python script:

Create a JSON file named request.json for your request body.

You must add and format your input for online prediction with the request body details that the target container requires.
Install the latest version of the Vertex AI Platform client library .
Set the required environment variables on a Python script .
Authenticate your API request .

Add the following code to the Python script you created:

  import 
  
 json 
 import 
  
 os 
 from 
  
 typing 
  
 import 
 Sequence 
 import 
  
 grpc 
 from 
  
 absl 
  
 import 
 app 
 from 
  
 absl 
  
 import 
 flags 
 import 
  
  google 
 
 from 
  
 google.auth.transport 
  
 import 
  requests 
 
 from 
  
 google.protobuf 
  
 import 
 json_format 
 from 
  
 google.protobuf.struct_pb2 
  
 import 
 Value 
 from 
  
 google.cloud.aiplatform_v1.services 
  
 import 
 prediction_service 
 _INPUT 
 = 
 flags 
 . 
 DEFINE_string 
 ( 
 "input" 
 , 
 None 
 , 
 "input" 
 , 
 required 
 = 
 True 
 ) 
 _ENDPOINT_HOSTNAME 
 = 
 flags 
 . 
 DEFINE_string 
 ( 
 "endpoint_hostname" 
 , 
 None 
 , 
 "Prediction endpoint FQDN" 
 , 
 required 
 = 
 True 
 ) 
 _PROJECT_NAME 
 = 
 flags 
 . 
 DEFINE_string 
 ( 
 "project_name" 
 , 
 None 
 , 
 "project name" 
 , 
 required 
 = 
 True 
 ) 
 _ENDPOINT_NAME 
 = 
 flags 
 . 
 DEFINE_string 
 ( 
 "endpoint_name" 
 , 
 None 
 , 
 "endpoint name" 
 , 
 required 
 = 
 True 
 ) 
 os 
 . 
 environ 
 [ 
 "GRPC_DEFAULT_SSL_ROOTS_FILE_PATH" 
 ] 
 = 
 "path-to-ca-cert-file.cert" 
 def 
  
 get_sts_token 
 ( 
 endpoint_hostname 
 ): 
 creds 
 = 
 None 
 try 
 : 
 creds 
 , 
 _ 
 = 
  google 
 
 . 
 auth 
 . 
 default 
 () 
 creds 
 = 
 creds 
 . 
 with_gdch_audience 
 ( 
 "https://" 
 + 
 endpoint_hostname 
 + 
 ":443" 
 ) 
 req 
 = 
  requests 
 
 . 
  Request 
 
 () 
 creds 
 . 
 refresh 
 ( 
 req 
 ) 
 print 
 ( 
 "Got token: " 
 ) 
 print 
 ( 
 creds 
 . 
 token 
 ) 
 except 
 Exception 
 as 
 e 
 : 
 print 
 ( 
 "Caught exception" 
 + 
 str 
 ( 
 e 
 )) 
 raise 
 e 
 return 
 creds 
 . 
 token 
 # predict_client_secure builds a client that requires TLS 
 def 
  
 predict_client_secure 
 ( 
 endpoint_hostname 
 , 
 token 
 ): 
 with 
 open 
 ( 
 os 
 . 
 environ 
 [ 
 "GRPC_DEFAULT_SSL_ROOTS_FILE_PATH" 
 ], 
 'rb' 
 ) 
 as 
 f 
 : 
 channel_creds 
 = 
 grpc 
 . 
 ssl_channel_credentials 
 ( 
 f 
 . 
 read 
 ()) 
 call_creds 
 = 
 grpc 
 . 
 access_token_call_credentials 
 ( 
 token 
 ) 
 creds 
 = 
 grpc 
 . 
 composite_channel_credentials 
 ( 
 channel_creds 
 , 
 call_creds 
 , 
 ) 
 client 
 = 
 prediction_service 
 . 
 PredictionServiceClient 
 ( 
 transport 
 = 
 prediction_service 
 . 
 transports 
 . 
 grpc 
 . 
 PredictionServiceGrpcTransport 
 ( 
 channel 
 = 
 grpc 
 . 
 secure_channel 
 ( 
 target 
 = 
 endpoint_hostname 
 + 
 ":443" 
 , 
 credentials 
 = 
 creds 
 ))) 
 return 
 client 
 def 
  
 predict_func 
 ( 
 client 
 , 
 instances 
 ): 
 # The endpoint resource name is required for authorization. 
 # A wrong value might lead to an access denied error. 
 endpoint_resource_name 
 = 
 f 
 "projects/ 
 { 
 _PROJECT_NAME 
 . 
 value 
 } 
 /locations/ 
 { 
 _PROJECT_NAME 
 . 
 value 
 } 
 /endpoints/ 
 { 
 _ENDPOINT_NAME 
 . 
 value 
 } 
 " 
 resp 
 = 
 client 
 . 
  predict 
 
 ( 
 endpoint 
 = 
 endpoint_resource_name 
 , 
 instances 
 = 
 instances 
 , 
 metadata 
 = 
 [( 
 "x-vertex-ai-endpoint-id" 
 , 
 _ENDPOINT_NAME 
 . 
 value 
 )] 
 ) 
 print 
 ( 
 resp 
 ) 
 def 
  
 main 
 ( 
 argv 
 : 
 Sequence 
 [ 
 str 
 ]): 
 del 
 argv 
 # Unused. 
 with 
 open 
 ( 
 _INPUT 
 . 
 value 
 ) 
 as 
 json_file 
 : 
 data 
 = 
 json 
 . 
  load 
 
 ( 
 json_file 
 ) 
 instances 
 = 
 [ 
 json_format 
 . 
 ParseDict 
 ( 
 s 
 , 
 Value 
 ()) 
 for 
 s 
 in 
 data 
 [ 
 "instances" 
 ]] 
 token 
 = 
 get_sts_token 
 ( 
 _ENDPOINT_HOSTNAME 
 . 
 value 
 ) 
 client 
 = 
 predict_client_secure 
 ( 
 _ENDPOINT_HOSTNAME 
 . 
 value 
 , 
 token 
 ) 
 predict_func 
 ( 
 client 
 = 
 client 
 , 
 instances 
 = 
 instances 
 ) 
 if 
 __name__ 
 == 
 "__main__" 
 : 
 app 
 . 
 run 
 ( 
 main 
 )

Save the Python script with a name, such as prediction.py .
Make the request to the prediction server:
```
 python  
 SCRIPT_NAME 
  
--input  
request.json  
 \ 
  
--endpoint_hostname  
 ENDPOINT_FQDN 
  
 \ 
  
--project_name  
 PROJECT_NAME 
  
 \ 
  
--endpoint_name  
 ENDPOINT_NAME 
  
 \ 
 
```
Replace the following:
- SCRIPT_NAME : the name of the Python script, such as prediction.py .
- ENDPOINT_FQDN : Fully Qualified Domain Name of the endpoint for the online prediction request.
- PROJECT_NAME : project name of the endpoint.
- ENDPOINT_NAME : the name of the endpoint to call.

If successful, you receive a JSON response to your online prediction request. For more information about responses, see Response body details .