Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle .

Use an agent

The code for querying an agent is the same regardless of whether it is running locally or deployed remotely. Therefore, in this page, the term agent refers to either local_agent or remote_agent interchangeably. As the set of supported operations varies across frameworks, we provide usage instructions for framework-specific templates:

Framework	Description
Agent Development Kit (preview)	Designed based on Google's internal best practices for developers building AI applications or teams needing to rapidly prototype and deploy robust agent-based solutions.
LangChain	Easier to use for basic use cases because of its predefined configurations and abstractions.
LangGraph	Graph-based approach to defining workflows, with advanced human-in-the-loop and rewind/replay capabilities.
AG2 (formerly AutoGen)	AG2 provides multi-agent conversation framework as a high-level abstraction for building LLM workflows.
LlamaIndex (preview)	LlamaIndex's query pipeline offers a high-level interface for creating Retrieval-Augmented Generation (RAG) workflows.

For custom agents that are not based on one of the framework-specific templates, you can follow these steps:

User authentication .
Get an agent instance .
Look up supported operations .
Query the agent .
(If applicable) Stream responses from the agent .

Step 1: User authentication

Follows the same instructions as setting up your environment .

Step 2: Get an instance of an agent

To query an agent, you first need an instance of an agent. You can either create a new instance or get an existing instance of an agent.

To get the agent corresponding to a specific resource ID:

Vertex AI SDK for Python

Run the following code:

  from 
  
 vertexai 
  
 import 
 agent_engines 
 agent 
 = 
 agent_engines 
 . 
 get 
 ( 
 " RESOURCE_ID 
" 
 )

Alternatively, you can provide the full resource name of the agent:

  agent 
 = 
 agent_engines 
 . 
 get 
 ( 
 "projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
" 
 )

requests

Run the following code:

  from 
  
 google 
  
 import 
 auth 
 as 
 google_auth 
 from 
  
 google.auth.transport 
  
 import 
 requests 
 as 
 google_requests 
 import 
  
 requests 
 def 
  
 get_identity_token 
 (): 
 credentials 
 , 
 _ 
 = 
 google_auth 
 . 
 default 
 () 
 auth_request 
 = 
 google_requests 
 . 
 Request 
 () 
 credentials 
 . 
 refresh 
 ( 
 auth_request 
 ) 
 return 
 credentials 
 . 
 token 
 response 
 = 
 requests 
 . 
 get 
 ( 
 f 
 "https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
" 
 , 
 headers 
 = 
 { 
 "Content-Type" 
 : 
 "application/json; charset=utf-8" 
 , 
 "Authorization" 
 : 
 f 
 "Bearer 
 { 
 get_identity_token 
 () 
 } 
 " 
 , 
 }, 
 )

REST

 curl  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID

The rest of this section assumes that you have an instance, named as agent .

Step 3: Supported operations

When developing the agent locally, you have access and knowledge of the operations that it supports. To use a deployed agent , you can enumerate the operations that it supports:

Vertex AI SDK for Python

Run the following code:

  agent 
 . 
 operation_schemas 
 ()

requests

Run the following code:

  import 
  
 json 
 json 
 . 
 loads 
 ( 
 response 
 . 
 content 
 ) 
 . 
 get 
 ( 
 "spec" 
 ) 
 . 
 get 
 ( 
 "classMethods" 
 )

REST

Represented in spec.class_methods from the response to the curl request.

The schema for each operation is a dictionary that documents the information of a method for the agent that you can call. The following is an example of the operation schema for a synchronous operation:

The following command provides a list of schemas in JSON format that correspond to the operations of the remote_app object:

  agent 
 . 
 operation_schemas 
 ()

As an example, the following is the schema for the query operation of a LangchainAgent :

  { 
 'api_mode' 
 : 
 '' 
 , 
 'name' 
 : 
 'query' 
 , 
 'description' 
 : 
 """Queries the Agent with the given input and config. 
 Args: 
 input (Union[str, Mapping[str, Any]]): 
 Required. The input to be passed to the Agent. 
 config (langchain_core.runnables.RunnableConfig): 
 Optional. The config (if any) to be used for invoking the Agent. 
 Returns: 
 The output of querying the Agent with the given input and config. 
 """ 
 , 
 '        ' 
 , 
 'parameters' 
 : 
 { 
 '$defs' 
 : 
 { 
 'RunnableConfig' 
 : 
 { 
 'description' 
 : 
 'Configuration for a Runnable.' 
 , 
 'properties' 
 : 
 { 
 'configurable' 
 : 
 { 
 ... 
 }, 
 'run_id' 
 : 
 { 
 ... 
 }, 
 'run_name' 
 : 
 { 
 ... 
 }, 
 ... 
 }, 
 'type' 
 : 
 'object' 
 }}, 
 'properties' 
 : 
 { 
 'config' 
 : 
 { 
 'nullable' 
 : 
 True 
 }, 
 'input' 
 : 
 { 
 'anyOf' 
 : 
 [{ 
 'type' 
 : 
 'string' 
 }, 
 { 
 'type' 
 : 
 'object' 
 }]}}, 
 'required' 
 : 
 [ 
 'input' 
 ], 
 'type' 
 : 
 'object' 
 }}

where

name is the name of the operation (i.e. agent.query for an operation named query ).
api_mode is the API mode of the operation ( "" for synchronous, "stream" for streaming).
description is a description of the operation based on the method's docstring.
parameters is the schema of the input arguments in OpenAPI schema format.

Step 4: Query the agent

To query the agent using one of its supported operations (e.g. query ):

Vertex AI SDK for Python

  agent 
 . 
 query 
 ( 
 input 
 = 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 )

requests

  from 
  
 google 
  
 import 
 auth 
 as 
 google_auth 
 from 
  
 google.auth.transport 
  
 import 
 requests 
 as 
 google_requests 
 import 
  
 requests 
 def 
  
 get_identity_token 
 (): 
 credentials 
 , 
 _ 
 = 
 google_auth 
 . 
 default 
 () 
 auth_request 
 = 
 google_requests 
 . 
 Request 
 () 
 credentials 
 . 
 refresh 
 ( 
 auth_request 
 ) 
 return 
 credentials 
 . 
 token 
 requests 
 . 
 post 
 ( 
 f 
 "https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
:query" 
 , 
 headers 
 = 
 { 
 "Content-Type" 
 : 
 "application/json; charset=utf-8" 
 , 
 "Authorization" 
 : 
 f 
 "Bearer 
 { 
 get_identity_token 
 () 
 } 
 " 
 , 
 }, 
 data 
 = 
 json 
 . 
 dumps 
 ({ 
 "class_method" 
 : 
 "query" 
 , 
 "input" 
 : 
 { 
 "input" 
 : 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 } 
 }) 
 )

REST

 curl  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
:query  
-d  
 '{ 
 "class_method": "query", 
 "input": { 
 "input": "What is the exchange rate from US dollars to Swedish Krona today?" 
 } 
 }'

The query response is a string that is similar to the output of a local application test :

  { 
 "input" 
 : 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 , 
 # ... 
 "output" 
 : 
 "For 1 US dollar you will get 10.7345 Swedish Krona." 
 }

Step 5: Stream responses from the agent

If applicable, you can stream a response from the agent using one of its operations (e.g. stream_query ):

Vertex AI SDK for Python

  agent 
 = 
 agent_engines 
 . 
 get 
 ( 
 "projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
" 
 ) 
 for 
 response 
 in 
 agent 
 . 
 stream_query 
 ( 
 input 
 = 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 ): 
 print 
 ( 
 response 
 )

requests

  from 
  
 google 
  
 import 
 auth 
 as 
 google_auth 
 from 
  
 google.auth.transport 
  
 import 
 requests 
 as 
 google_requests 
 import 
  
 requests 
 def 
  
 get_identity_token 
 (): 
 credentials 
 , 
 _ 
 = 
 google_auth 
 . 
 default 
 () 
 auth_request 
 = 
 google_requests 
 . 
 Request 
 () 
 credentials 
 . 
 refresh 
 ( 
 auth_request 
 ) 
 return 
 credentials 
 . 
 token 
 requests 
 . 
 post 
 ( 
 f 
 "https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
:streamQuery" 
 , 
 headers 
 = 
 { 
 "Content-Type" 
 : 
 "application/json" 
 , 
 "Authorization" 
 : 
 f 
 "Bearer 
 { 
 get_identity_token 
 () 
 } 
 " 
 , 
 }, 
 data 
 = 
 json 
 . 
 dumps 
 ({ 
 "class_method" 
 : 
 "stream_query" 
 , 
 "input" 
 : 
 { 
 "input" 
 : 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 }, 
 }), 
 stream 
 = 
 True 
 , 
 )

REST

 curl  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
:streamQuery?alt = 
sse  
-d  
 '{ 
 "class_method": "stream_query", 
 "input": { 
 "input": "What is the exchange rate from US dollars to Swedish Krona today?" 
 } 
 }'

Vertex AI Agent Engine streams responses as a sequence of iteratively generated objects. For example, a set of three responses might look like the following:

  { 
 'actions' 
 : 
 [{ 
 'tool' 
 : 
 'get_exchange_rate' 
 , 
 ... 
 }]} 
 # first response 
 { 
 'steps' 
 : 
 [{ 
 'action' 
 : 
 { 
 'tool' 
 : 
 'get_exchange_rate' 
 , 
 ... 
 }}]} 
 # second response 
 { 
 'output' 
 : 
 'The exchange rate is 11.0117 SEK per USD as of 2024-12-03.' 
 } 
 # final response

Step 6: Asynchronously query the agent

If you defined an async_query operation when developing the agent , there is support for client-side async querying of the agent in the Vertex AI SDK for Python.

Vertex AI SDK for Python

  agent 
 = 
 agent_engines 
 . 
 get 
 ( 
 "projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
" 
 ) 
 response 
 = 
 await 
 agent 
 . 
 async_query 
 ( 
 input 
 = 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 ) 
 print 
 ( 
 response 
 )

The query response is a dictionary that is the same as the output of a local test :

  { 
 "input" 
 : 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 , 
 # ... 
 "output" 
 : 
 "For 1 US dollar you will get 10.7345 Swedish Krona." 
 }

Step 7: Asynchronously stream responses from the agent

If you defined an async_stream_query operation when developing the agent , you can asynchronously stream a response from the agent using one of its operations (e.g. async_stream_query ):

Vertex AI SDK for Python

  agent 
 = 
 agent_engines 
 . 
 get 
 ( 
 "projects/ PROJECT_ID 
/locations/ LOCATION 
/reasoningEngines/ RESOURCE_ID 
" 
 ) 
 async 
 for 
 response 
 in 
 agent 
 . 
 async_stream_query 
 ( 
 input 
 = 
 "What is the exchange rate from US dollars to Swedish Krona today?" 
 ): 
 print 
 ( 
 response 
 )

The async_stream_query operation calls the same streamQuery endpoint under-the-hood and asynchronously stream responses as a sequence of iteratively generated objects. For example, a set of three responses might look like the following:

  { 
 'actions' 
 : 
 [{ 
 'tool' 
 : 
 'get_exchange_rate' 
 , 
 ... 
 }]} 
 # first response 
 { 
 'steps' 
 : 
 [{ 
 'action' 
 : 
 { 
 'tool' 
 : 
 'get_exchange_rate' 
 , 
 ... 
 }}]} 
 # second response 
 { 
 'output' 
 : 
 'The exchange rate is 11.0117 SEK per USD as of 2024-12-03.' 
 } 
 # final response

The responses should be the same as those generated during local testing .

Use an agent Stay organized with collections Save and categorize content based on your preferences.

Step 1: User authentication

Step 2: Get an instance of an agent

Vertex AI SDK for Python

requests

REST

Step 3: Supported operations

Vertex AI SDK for Python

requests

REST

Step 4: Query the agent

Vertex AI SDK for Python

requests

REST

Step 5: Stream responses from the agent

Vertex AI SDK for Python

requests

REST

Step 6: Asynchronously query the agent

Vertex AI SDK for Python

Step 7: Asynchronously stream responses from the agent

Vertex AI SDK for Python

What's next

Use an agent