The code for querying an agent is the same regardless of whether it is running locally
or deployed
remotely. Therefore, in this
page, the term agent
refers to either local_agent
or remote_agent
interchangeably. As the set of supported operations varies across frameworks, we
provide usage instructions for framework-specific templates:
Framework | Description |
---|---|
Agent Development Kit (preview) | Designed based on Google's internal best practices for developers building AI applications or teams needing to rapidly prototype and deploy robust agent-based solutions. |
LangChain | Easier to use for basic use cases because of its predefined configurations and abstractions. |
LangGraph | Graph-based approach to defining workflows, with advanced human-in-the-loop and rewind/replay capabilities. |
AG2 (formerly AutoGen) | AG2 provides multi-agent conversation framework as a high-level abstraction for building LLM workflows. |
LlamaIndex (preview) | LlamaIndex's query pipeline offers a high-level interface for creating Retrieval-Augmented Generation (RAG) workflows. |
For custom agents that are not based on one of the framework-specific templates, you can follow these steps:
- User authentication .
- Get an agent instance .
- Look up supported operations .
- Query the agent .
- (If applicable) Stream responses from the agent .
Step 1: User authentication
Follows the same instructions as setting up your environment .
Step 2: Get an instance of an agent
To query an agent, you first need an instance of an agent. You can either create a new instance or get an existing instance of an agent.
To get the agent corresponding to a specific resource ID:
Vertex AI SDK for Python
Run the following code:
from
vertexai
import
agent_engines
agent
=
agent_engines
.
get
(
" RESOURCE_ID
"
)
Alternatively, you can provide the full resource name of the agent:
agent
=
agent_engines
.
get
(
"projects/ PROJECT_ID
/locations/ LOCATION
/reasoningEngines/ RESOURCE_ID
"
)
requests
Run the following code:
from
google
import
auth
as
google_auth
from
google.auth.transport
import
requests
as
google_requests
import
requests
def
get_identity_token
():
credentials
,
_
=
google_auth
.
default
()
auth_request
=
google_requests
.
Request
()
credentials
.
refresh
(
auth_request
)
return
credentials
.
token
response
=
requests
.
get
(
f
"https:// LOCATION
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID
/locations/ LOCATION
/reasoningEngines/ RESOURCE_ID
"
,
headers
=
{
"Content-Type"
:
"application/json; charset=utf-8"
,
"Authorization"
:
f
"Bearer
{
get_identity_token
()
}
"
,
},
)
REST
curl
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https:// LOCATION
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID
/locations/ LOCATION
/reasoningEngines/ RESOURCE_ID
The rest of this section assumes that you have an instance, named as agent
.
Step 3: Supported operations
When developing the agent locally, you have access and knowledge of the operations that it supports. To use a deployed agent , you can enumerate the operations that it supports:
Vertex AI SDK for Python
Run the following code:
agent
.
operation_schemas
()
requests
Run the following code:
import
json
json
.
loads
(
response
.
content
)
.
get
(
"spec"
)
.
get
(
"classMethods"
)
REST
Represented in spec.class_methods
from the response to the curl request.
The schema for each operation is a dictionary that documents the information of a method for the agent that you can call. The following is an example of the operation schema for a synchronous operation:
The following command provides a list of schemas in JSON format
that correspond to the operations of the remote_app
object:
agent
.
operation_schemas
()
As an example, the following is the schema for the query
operation of a LangchainAgent
:
{
'api_mode'
:
''
,
'name'
:
'query'
,
'description'
:
"""Queries the Agent with the given input and config.
Args:
input (Union[str, Mapping[str, Any]]):
Required. The input to be passed to the Agent.
config (langchain_core.runnables.RunnableConfig):
Optional. The config (if any) to be used for invoking the Agent.
Returns:
The output of querying the Agent with the given input and config.
"""
,
' '
,
'parameters'
:
{
'$defs'
:
{
'RunnableConfig'
:
{
'description'
:
'Configuration for a Runnable.'
,
'properties'
:
{
'configurable'
:
{
...
},
'run_id'
:
{
...
},
'run_name'
:
{
...
},
...
},
'type'
:
'object'
}},
'properties'
:
{
'config'
:
{
'nullable'
:
True
},
'input'
:
{
'anyOf'
:
[{
'type'
:
'string'
},
{
'type'
:
'object'
}]}},
'required'
:
[
'input'
],
'type'
:
'object'
}}
where
-
name
is the name of the operation (i.e.agent.query
for an operation namedquery
). -
api_mode
is the API mode of the operation (""
for synchronous,"stream"
for streaming). -
description
is a description of the operation based on the method's docstring. -
parameters
is the schema of the input arguments in OpenAPI schema format.
Step 4: Query the agent
To query the agent using one of its supported operations (e.g. query
):
Vertex AI SDK for Python
agent
.
query
(
input
=
"What is the exchange rate from US dollars to Swedish Krona today?"
)
requests
from
google
import
auth
as
google_auth
from
google.auth.transport
import
requests
as
google_requests
import
requests
def
get_identity_token
():
credentials
,
_
=
google_auth
.
default
()
auth_request
=
google_requests
.
Request
()
credentials
.
refresh
(
auth_request
)
return
credentials
.
token
requests
.
post
(
f
"https:// LOCATION
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID
/locations/ LOCATION
/reasoningEngines/ RESOURCE_ID
:query"
,
headers
=
{
"Content-Type"
:
"application/json; charset=utf-8"
,
"Authorization"
:
f
"Bearer
{
get_identity_token
()
}
"
,
},
data
=
json
.
dumps
({
"class_method"
:
"query"
,
"input"
:
{
"input"
:
"What is the exchange rate from US dollars to Swedish Krona today?"
}
})
)
REST
curl
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https:// LOCATION
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID
/locations/ LOCATION
/reasoningEngines/ RESOURCE_ID
:query
-d
'{
"class_method": "query",
"input": {
"input": "What is the exchange rate from US dollars to Swedish Krona today?"
}
}'
The query response is a string that is similar to the output of a local application test :
{
"input"
:
"What is the exchange rate from US dollars to Swedish Krona today?"
,
# ...
"output"
:
"For 1 US dollar you will get 10.7345 Swedish Krona."
}
Step 5: Stream responses from the agent
If applicable, you can stream a response from the agent using one of its operations (e.g. stream_query
):
Vertex AI SDK for Python
agent
=
agent_engines
.
get
(
"projects/ PROJECT_ID
/locations/ LOCATION
/reasoningEngines/ RESOURCE_ID
"
)
for
response
in
agent
.
stream_query
(
input
=
"What is the exchange rate from US dollars to Swedish Krona today?"
):
print
(
response
)
requests
from
google
import
auth
as
google_auth
from
google.auth.transport
import
requests
as
google_requests
import
requests
def
get_identity_token
():
credentials
,
_
=
google_auth
.
default
()
auth_request
=
google_requests
.
Request
()
credentials
.
refresh
(
auth_request
)
return
credentials
.
token
requests
.
post
(
f
"https:// LOCATION
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID
/locations/ LOCATION
/reasoningEngines/ RESOURCE_ID
:streamQuery"
,
headers
=
{
"Content-Type"
:
"application/json"
,
"Authorization"
:
f
"Bearer
{
get_identity_token
()
}
"
,
},
data
=
json
.
dumps
({
"class_method"
:
"stream_query"
,
"input"
:
{
"input"
:
"What is the exchange rate from US dollars to Swedish Krona today?"
},
}),
stream
=
True
,
)
REST
curl
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https:// LOCATION
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID
/locations/ LOCATION
/reasoningEngines/ RESOURCE_ID
:streamQuery?alt =
sse
-d
'{
"class_method": "stream_query",
"input": {
"input": "What is the exchange rate from US dollars to Swedish Krona today?"
}
}'
Vertex AI Agent Engine streams responses as a sequence of iteratively generated objects. For example, a set of three responses might look like the following:
{
'actions'
:
[{
'tool'
:
'get_exchange_rate'
,
...
}]}
# first response
{
'steps'
:
[{
'action'
:
{
'tool'
:
'get_exchange_rate'
,
...
}}]}
# second response
{
'output'
:
'The exchange rate is 11.0117 SEK per USD as of 2024-12-03.'
}
# final response
Step 6: Asynchronously query the agent
If you defined an async_query
operation when developing the agent
,
there is support for client-side async querying of the agent in the
Vertex AI SDK for Python.
Vertex AI SDK for Python
agent
=
agent_engines
.
get
(
"projects/ PROJECT_ID
/locations/ LOCATION
/reasoningEngines/ RESOURCE_ID
"
)
response
=
await
agent
.
async_query
(
input
=
"What is the exchange rate from US dollars to Swedish Krona today?"
)
print
(
response
)
The query response is a dictionary that is the same as the output of a local test :
{
"input"
:
"What is the exchange rate from US dollars to Swedish Krona today?"
,
# ...
"output"
:
"For 1 US dollar you will get 10.7345 Swedish Krona."
}
Step 7: Asynchronously stream responses from the agent
If you defined an async_stream_query
operation when developing the agent
,
you can asynchronously stream a response from the agent using one of its
operations (e.g. async_stream_query
):
Vertex AI SDK for Python
agent
=
agent_engines
.
get
(
"projects/ PROJECT_ID
/locations/ LOCATION
/reasoningEngines/ RESOURCE_ID
"
)
async
for
response
in
agent
.
async_stream_query
(
input
=
"What is the exchange rate from US dollars to Swedish Krona today?"
):
print
(
response
)
The async_stream_query
operation calls the same streamQuery
endpoint
under-the-hood and asynchronously
stream responses as a sequence of iteratively generated objects. For example, a
set of three responses might look like the following:
{
'actions'
:
[{
'tool'
:
'get_exchange_rate'
,
...
}]}
# first response
{
'steps'
:
[{
'action'
:
{
'tool'
:
'get_exchange_rate'
,
...
}}]}
# second response
{
'output'
:
'The exchange rate is 11.0117 SEK per USD as of 2024-12-03.'
}
# final response
The responses should be the same as those generated during local testing .