Before you begin
This tutorial assumes that you have read and followed the instructions in:
-  Develop a custom agent 
: to develop a custom 
agent. - User authentication to authenticate as a user for querying the agent.
 - Import and initialize the SDK to initialize the client for getting a deployed instance (if needed).
 
Get an instance of an agent
To query an agent, you first need an instance of an agent. You can either create a new instance or get an existing instance of an agent.
To get the agent corresponding to a specific resource ID:
Vertex AI SDK for Python
Run the following code:
  import 
  
  vertexai 
 
 client 
 = 
  vertexai 
 
 . 
 Client 
 ( 
 # For service interactions via client.agent_engines 
 project 
 = 
 " PROJECT_ID 
" 
 , 
 location 
 = 
 " LOCATION 
" 
 , 
 ) 
 agent 
 = 
 client 
 . 
  agent_engines 
 
 . 
 get 
 ( 
 name 
 = 
 "projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
" 
 ) 
 print 
 ( 
 agent 
 ) 
 
 
where
-  
PROJECT_IDis the Google Cloud project ID under which you develop and deploy agents, and -  
LOCATIONis one of the supported regions . -  
RESOURCE_IDis the ID of the deployed agent as areasoningEngineresource . 
requests
Run the following code:
  from 
  
 google 
  
 import 
 auth 
 as 
 google_auth 
 from 
  
 google.auth.transport 
  
 import 
 requests 
 as 
 google_requests 
 import 
  
 requests 
 def 
  
 get_identity_token 
 (): 
 credentials 
 , 
 _ 
 = 
 google_auth 
 . 
 default 
 () 
 auth_request 
 = 
 google_requests 
 . 
 Request 
 () 
 credentials 
 . 
 refresh 
 ( 
 auth_request 
 ) 
 return 
 credentials 
 . 
 token 
 response 
 = 
 requests 
 . 
 get 
 ( 
 f 
 "https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
" 
 , 
 headers 
 = 
 { 
 "Content-Type" 
 : 
 "application/json; charset=utf-8" 
 , 
 "Authorization" 
 : 
 f 
 "Bearer 
 { 
 get_identity_token 
 () 
 } 
 " 
 , 
 }, 
 ) 
 
 
REST
 curl  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
 
 
When using the Vertex AI SDK for Python, the agent 
object corresponds to an AgentEngine 
class that contains the following:
- an 
agent.api_resourcewith information about the deployed agent. You can also callagent.operation_schemas()to return the list of operations that the agent supports. See Supported operations for details. - an  
agent.api_clientthat allows for synchronous service interactions - an  
agent.async_api_clientthat allows for asynchronous service interactions 
The rest of this section assumes that you have an instance, named as agent 
.
List supported operations
When developing the agent locally, you have access and knowledge of the operations that it supports. To use a deployed agent , you can enumerate the operations that it supports:
Vertex AI SDK for Python
Run the following code:
  print 
 ( 
 agent 
 . 
 operation_schemas 
 ()) 
 
 
requests
Run the following code:
  import 
  
 json 
 json 
 . 
 loads 
 ( 
 response 
 . 
 content 
 ) 
 . 
 get 
 ( 
 "spec" 
 ) 
 . 
 get 
 ( 
 "classMethods" 
 ) 
 
 
REST
Represented in spec.class_methods 
from the response to the curl request.
The schema for each operation is a dictionary that documents the information of a method for the agent that you can call. The set of supported operations depends on the framework you used to develop your agent:
As an example, the following is the schema for the query 
operation of a LangchainAgent 
:
  { 
 'api_mode' 
 : 
 '' 
 , 
 'name' 
 : 
 'query' 
 , 
 'description' 
 : 
 """Queries the Agent with the given input and config. 
 Args: 
 input (Union[str, Mapping[str, Any]]): 
 Required. The input to be passed to the Agent. 
 config (langchain_core.runnables.RunnableConfig): 
 Optional. The config (if any) to be used for invoking the Agent. 
 Returns: 
 The output of querying the Agent with the given input and config. 
 """ 
 , 
 '        ' 
 , 
 'parameters' 
 : 
 { 
 '$defs' 
 : 
 { 
 'RunnableConfig' 
 : 
 { 
 'description' 
 : 
 'Configuration for a Runnable.' 
 , 
 'properties' 
 : 
 { 
 'configurable' 
 : 
 { 
 ... 
 }, 
 'run_id' 
 : 
 { 
 ... 
 }, 
 'run_name' 
 : 
 { 
 ... 
 }, 
 ... 
 }, 
 'type' 
 : 
 'object' 
 }}, 
 'properties' 
 : 
 { 
 'config' 
 : 
 { 
 'nullable' 
 : 
 True 
 }, 
 'input' 
 : 
 { 
 'anyOf' 
 : 
 [{ 
 'type' 
 : 
 'string' 
 }, 
 { 
 'type' 
 : 
 'object' 
 }]}}, 
 'required' 
 : 
 [ 
 'input' 
 ], 
 'type' 
 : 
 'object' 
 }} 
 
 
where
-  
nameis the name of the operation (i.e.agent.queryfor an operation namedquery). -  
api_modeis the API mode of the operation (""for synchronous,"stream"for streaming). -  
descriptionis a description of the operation based on the method's docstring. -  
parametersis the schema of the input arguments in OpenAPI schema format. 
Query the agent using supported operations
For custom agents, you can use any of the following query or streaming operations you defined when developing your agent:
Note that certain frameworks only support specific query or streaming operations:
| Framework | Supported query operations | 
|---|---|
| Agent Development Kit |  async_stream_query 
 |  
| LangChain |  query 
, stream_query 
 |  
| LangGraph |  query 
, stream_query 
 |  
| AG2 |  query 
 |  
| LlamaIndex |  query 
 |  
Query the agent
Query the agent using the query 
operation:
Vertex AI SDK for Python
  agent 
 . 
 query 
 ( 
 input 
 = 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 ) 
 
 
requests
  from 
  
 google 
  
 import 
 auth 
 as 
 google_auth 
 from 
  
 google.auth.transport 
  
 import 
 requests 
 as 
 google_requests 
 import 
  
 requests 
 def 
  
 get_identity_token 
 (): 
 credentials 
 , 
 _ 
 = 
 google_auth 
 . 
 default 
 () 
 auth_request 
 = 
 google_requests 
 . 
 Request 
 () 
 credentials 
 . 
 refresh 
 ( 
 auth_request 
 ) 
 return 
 credentials 
 . 
 token 
 requests 
 . 
 post 
 ( 
 f 
 "https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
:query" 
 , 
 headers 
 = 
 { 
 "Content-Type" 
 : 
 "application/json; charset=utf-8" 
 , 
 "Authorization" 
 : 
 f 
 "Bearer 
 { 
 get_identity_token 
 () 
 } 
 " 
 , 
 }, 
 data 
 = 
 json 
 . 
 dumps 
 ({ 
 "class_method" 
 : 
 "query" 
 , 
 "input" 
 : 
 { 
 "input" 
 : 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 } 
 }) 
 ) 
 
 
REST
 curl  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
:query  
-d  
 '{ 
 "class_method": "query", 
 "input": { 
 "input": "What is the exchange rate from US dollars to Swedish Krona today?" 
 } 
 }' 
 
 
The query response is a string that is similar to the output of a local application test :
  { 
 "input" 
 : 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 , 
 # ... 
 "output" 
 : 
 "For 1 US dollar you will get 10.7345 Swedish Krona." 
 } 
 
 
Stream responses from the agent
Stream a response from the agent using the stream_query 
operation:
Vertex AI SDK for Python
  agent 
 = 
 agent_engines 
 . 
 get 
 ( 
 "projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
" 
 ) 
 for 
 response 
 in 
 agent 
 . 
 stream_query 
 ( 
 input 
 = 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 ): 
 print 
 ( 
 response 
 ) 
 
 
requests
  from 
  
 google 
  
 import 
 auth 
 as 
 google_auth 
 from 
  
 google.auth.transport 
  
 import 
 requests 
 as 
 google_requests 
 import 
  
 requests 
 def 
  
 get_identity_token 
 (): 
 credentials 
 , 
 _ 
 = 
 google_auth 
 . 
 default 
 () 
 auth_request 
 = 
 google_requests 
 . 
 Request 
 () 
 credentials 
 . 
 refresh 
 ( 
 auth_request 
 ) 
 return 
 credentials 
 . 
 token 
 requests 
 . 
 post 
 ( 
 f 
 "https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
:streamQuery" 
 , 
 headers 
 = 
 { 
 "Content-Type" 
 : 
 "application/json" 
 , 
 "Authorization" 
 : 
 f 
 "Bearer 
 { 
 get_identity_token 
 () 
 } 
 " 
 , 
 }, 
 data 
 = 
 json 
 . 
 dumps 
 ({ 
 "class_method" 
 : 
 "stream_query" 
 , 
 "input" 
 : 
 { 
 "input" 
 : 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 }, 
 }), 
 stream 
 = 
 True 
 , 
 ) 
 
 
REST
 curl  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
:streamQuery?alt = 
sse  
-d  
 '{ 
 "class_method": "stream_query", 
 "input": { 
 "input": "What is the exchange rate from US dollars to Swedish Krona today?" 
 } 
 }' 
 
 
Vertex AI Agent Engine streams responses as a sequence of iteratively generated objects. For example, a set of three responses might look like the following:
  { 
 'actions' 
 : 
 [{ 
 'tool' 
 : 
 'get_exchange_rate' 
 , 
 ... 
 }]} 
 # first response 
 { 
 'steps' 
 : 
 [{ 
 'action' 
 : 
 { 
 'tool' 
 : 
 'get_exchange_rate' 
 , 
 ... 
 }}]} 
 # second response 
 { 
 'output' 
 : 
 'The exchange rate is 11.0117 SEK per USD as of 2024-12-03.' 
 } 
 # final response 
 
 
Asynchronously query the agent
If you defined an async_query 
operation when developing the agent 
,
there is support for client-side async querying of the agent in the
Vertex AI SDK for Python:
Vertex AI SDK for Python
  agent 
 = 
 agent_engines 
 . 
 get 
 ( 
 "projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
" 
 ) 
 response 
 = 
 await 
 agent 
 . 
 async_query 
 ( 
 input 
 = 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 ) 
 print 
 ( 
 response 
 ) 
 
 
The query response is a dictionary that is the same as the output of a local test :
  { 
 "input" 
 : 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 , 
 # ... 
 "output" 
 : 
 "For 1 US dollar you will get 10.7345 Swedish Krona." 
 } 
 
 
Asynchronously stream responses from the agent
If you defined an async_stream_query 
operation when developing the agent 
,
you can asynchronously stream a response from the agent using one of its
operations (e.g. async_stream_query 
):
Vertex AI SDK for Python
  agent 
 = 
 agent_engines 
 . 
 get 
 ( 
 "projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
" 
 ) 
 async 
 for 
 response 
 in 
 agent 
 . 
 async_stream_query 
 ( 
 input 
 = 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 ): 
 print 
 ( 
 response 
 ) 
 
 
The async_stream_query 
operation calls the same  streamQuery 
endpoint 
under-the-hood and asynchronously
stream responses as a sequence of iteratively generated objects. For example, a
set of three responses might look like the following:
  { 
 'actions' 
 : 
 [{ 
 'tool' 
 : 
 'get_exchange_rate' 
 , 
 ... 
 }]} 
 # first response 
 { 
 'steps' 
 : 
 [{ 
 'action' 
 : 
 { 
 'tool' 
 : 
 'get_exchange_rate' 
 , 
 ... 
 }}]} 
 # second response 
 { 
 'output' 
 : 
 'The exchange rate is 11.0117 SEK per USD as of 2024-12-03.' 
 } 
 # final response 
 
 
The responses should be the same as those generated during local testing .

