Get an online prediction

The Online Prediction service of Vertex AI lets you make synchronous requests to your own prediction model endpoint.

This page shows you how to send requests to your model so that it can serve online predictions with low latency.

Before you begin

Before you can start using the Online Prediction API, you must have a project and appropriate credentials.

Follow these steps before getting an online prediction:

  1. Set up a project for Vertex AI .
  2. To get the permissions that you need to access Online Prediction, ask your Project IAM Admin to grant you the Vertex AI Prediction User ( vertex-ai-prediction-user ) role.

    For information about this role, see Prepare IAM permissions .

  3. Create and train a prediction model targeting one of the supported containers .

  4. Create the prediction cluster and ensure your project allows incoming external traffic.

  5. Export your model artifacts for prediction .

  6. Deploy your model to an endpoint .

  7. Show details of the Endpoint custom resource of your prediction model:

     kubectl  
    --kubeconfig  
     PREDICTION_CLUSTER_KUBECONFIG 
      
    get  
    endpoint  
     PREDICTION_ENDPOINT 
      
    -n  
     PROJECT_NAMESPACE 
      
    -o  
     jsonpath 
     = 
     '{.status.endpointFQDN}' 
     
    

    Replace the following:

    • PREDICTION_CLUSTER_KUBECONFIG : the path to the kubeconfig file in the prediction cluster.
    • PREDICTION_ENDPOINT : the name of the endpoint.
    • PROJECT_NAMESPACE : the name of the prediction project namespace.

    The output must show the status field, displaying the endpoint fully-qualified domain name on the endpointFQDN field. Register this endpoint URL path to use it for your requests.

Set your environment variables

If you want to send a request to your model endpoint using a Python script and you set up a service account in your project to make authorized API calls programmatically, you can define environment variables in the script to access values such as the service account keys when running.

Follow these steps to set required environment variables on a Python script:

  1. Create a JupyterLab notebook to interact with the Online Prediction API.

  2. Create a Python script on the JupyterLab notebook.

  3. Add the following code to the Python script:

      import 
      
     os 
     os 
     . 
     environ 
     [ 
     "GOOGLE_APPLICATION_CREDENTIALS" 
     ] 
     = 
     " APPLICATION_DEFAULT_CREDENTIALS_FILENAME 
    " 
     
    

    Replace APPLICATION_DEFAULT_CREDENTIALS_FILENAME with the name of the JSON file that contains the service account keys you created in the project, such as my-service-key.json .

  4. Save the Python script with a name, such as prediction.py .

  5. Run the Python script to set the environment variables:

      python 
      SCRIPT_NAME 
     
     
    

    Replace SCRIPT_NAME with the name you gave to your Python script, such as prediction.py .

Send a request to an endpoint

Make a request to the model's endpoint to get an online prediction:

curl

Follow these steps to make a curl request:

  1. Create a JSON file named request.json for your request body.

    You must add and format your input for online prediction with the request body details that the target container requires.

  2. Get an authentication token .

  3. Make the request:

     curl  
    -X  
    POST  
    -H  
     "Content-Type: application/json; charset=utf-8" 
      
    -H  
     "Authorization: Bearer TOKEN 
    " 
    https:// ENDPOINT 
    :443/v1/model:predict  
    -d  
    @request.json 
    

    Replace the following:

If successful, you receive a JSON response to your online prediction request.

The following output shows an example:

  { 
  
 "predictions" 
 : 
  
 [[ 
 -357.10849 
 ], 
  
 [ 
 -171.621658 
 ] 
  
 ] 
 } 
 

For more information about responses, see Response body details .

Python

Follow these steps to use the Online Prediction service from a Python script:

  1. Create a JSON file named request.json for your request body.

    You must add and format your input for online prediction with the request body details that the target container requires.

  2. Install the latest version of the Vertex AI Platform client library .

  3. Set the required environment variables on a Python script .

  4. Authenticate your API request .

  5. Add the following code to the Python script you created:

      import 
      
     json 
     import 
      
     os 
     from 
      
     typing 
      
     import 
     Sequence 
     import 
      
     grpc 
     from 
      
     absl 
      
     import 
     app 
     from 
      
     absl 
      
     import 
     flags 
     from 
      
     google.auth.transport 
      
     import 
     requests 
     from 
      
     google.protobuf 
      
     import 
     json_format 
     from 
      
     google.protobuf.struct_pb2 
      
     import 
     Value 
     from 
      
     google.cloud.aiplatform_v1.services 
      
     import 
     prediction_service 
     _INPUT 
     = 
     flags 
     . 
     DEFINE_string 
     ( 
     "input" 
     , 
     None 
     , 
     "input" 
     , 
     required 
     = 
     True 
     ) 
     _HOST 
     = 
     flags 
     . 
     DEFINE_string 
     ( 
     "host" 
     , 
     None 
     , 
     "Prediction endpoint" 
     , 
     required 
     = 
     True 
     ) 
     _ENDPOINT_ID 
     = 
     flags 
     . 
     DEFINE_string 
     ( 
     "endpoint_id" 
     , 
     None 
     , 
     "endpoint id" 
     , 
     required 
     = 
     True 
     ) 
     os 
     . 
     environ 
     [ 
     "GRPC_DEFAULT_SSL_ROOTS_FILE_PATH" 
     ] 
     = 
     "path-to-ca-cert-file.cert" 
     # ENDPOINT_RESOURCE_NAME is a placeholder value that doesn't affect prediction behavior. 
     ENDPOINT_RESOURCE_NAME 
     = 
     "projects/000000000000/locations/us-central1/endpoints/00000000000000" 
     def 
      
     get_sts_token 
     ( 
     host 
     ): 
     creds 
     = 
     None 
     try 
     : 
     creds 
     , 
     _ 
     = 
     google 
     . 
     auth 
     . 
     default 
     () 
     creds 
     = 
     creds 
     . 
     with_gdch_audience 
     ( 
     host 
     + 
     ":443" 
     ) 
     req 
     = 
     requests 
     . 
     Request 
     () 
     creds 
     . 
     refresh 
     ( 
     req 
     ) 
     print 
     ( 
     "Got token: " 
     ) 
     print 
     ( 
     creds 
     . 
     token 
     ) 
     except 
     Exception 
     as 
     e 
     : 
     print 
     ( 
     "Caught exception" 
     + 
     str 
     ( 
     e 
     )) 
     raise 
     e 
     return 
     creds 
     . 
     token 
     # predict_client_secure builds a client that requires TLS 
     def 
      
     predict_client_secure 
     ( 
     host 
     , 
     token 
     ): 
     with 
     open 
     ( 
     os 
     . 
     environ 
     [ 
     "GRPC_DEFAULT_SSL_ROOTS_FILE_PATH" 
     ], 
     'rb' 
     ) 
     as 
     f 
     : 
     channel_creds 
     = 
     grpc 
     . 
     ssl_channel_credentials 
     ( 
     f 
     . 
     read 
     ()) 
     call_creds 
     = 
     grpc 
     . 
     access_token_call_credentials 
     ( 
     token 
     ) 
     creds 
     = 
     grpc 
     . 
     composite_channel_credentials 
     ( 
     channel_creds 
     , 
     call_creds 
     , 
     ) 
     client 
     = 
     prediction_service 
     . 
     PredictionServiceClient 
     ( 
     transport 
     = 
     prediction_service 
     . 
     transports 
     . 
     grpc 
     . 
     PredictionServiceGrpcTransport 
     ( 
     channel 
     = 
     grpc 
     . 
     secure_channel 
     ( 
     target 
     = 
     host 
     + 
     ":443" 
     , 
     credentials 
     = 
     creds 
     ))) 
     return 
     client 
     def 
      
     predict_func 
     ( 
     client 
     , 
     instances 
     ): 
     resp 
     = 
     client 
     . 
     predict 
     ( 
     endpoint 
     = 
     ENDPOINT_RESOURCE_NAME 
     , 
     instances 
     = 
     instances 
     , 
     metadata 
     = 
     [( 
     "x-vertex-ai-endpoint-id" 
     , 
     _ENDPOINT_ID 
     . 
     value 
     )] 
     ) 
     print 
     ( 
     resp 
     ) 
     def 
      
     main 
     ( 
     argv 
     : 
     Sequence 
     [ 
     str 
     ]): 
     del 
     argv 
     # Unused. 
     with 
     open 
     ( 
     _INPUT 
     . 
     value 
     ) 
     as 
     json_file 
     : 
     data 
     = 
     json 
     . 
     load 
     ( 
     json_file 
     ) 
     instances 
     = 
     [ 
     json_format 
     . 
     ParseDict 
     ( 
     s 
     , 
     Value 
     ()) 
     for 
     s 
     in 
     data 
     [ 
     "instances" 
     ]] 
     token 
     = 
     get_sts_token 
     ( 
     _HOST 
     . 
     value 
     ) 
     client 
     = 
     predict_client_secure 
     ( 
     _HOST 
     . 
     value 
     , 
     token 
     ) 
     predict_func 
     ( 
     client 
     = 
     client 
     , 
     instances 
     = 
     instances 
     ) 
     if 
     __name__ 
     == 
     "__main__" 
     : 
     app 
     . 
     run 
     ( 
     main 
     ) 
     
    
  6. Save the Python script with a name, such as prediction.py .

  7. Make the request to the prediction server:

     python  
     SCRIPT_NAME 
      
    --input  
    request.json  
     \ 
      
    --host  
     ENDPOINT 
      
     \ 
      
    --endpoint_id  
     ENDPOINT_ID 
      
     \ 
     
    

    Replace the following:

    • SCRIPT_NAME : the name of the Python script, such as prediction.py .
    • ENDPOINT : your model endpoint for the online prediction request.
    • ENDPOINT_ID : the value of the endpoint ID.

If successful, you receive a JSON response to your online prediction request. For more information about responses, see Response body details .

Design a Mobile Site
View Site in Mobile | Classic
Share by: