Call Gemini with the Chat Completions API
The following sample shows you how to send non-streaming requests:
REST
curl -X POST \ -H "Authorization: Bearer $( gcloud auth print-access-token ) " \ -H "Content-Type: application/json" \ https:// ${ LOCATION } -aiplatform.googleapis.com/v1beta1/projects/ ${ PROJECT_ID } /locations/ ${ LOCATION } /endpoints/openapi/chat/completions \ -d '{ "model": "google/${MODEL_ID}", "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
The following sample shows you how to send streaming requests to a Gemini model by using the Chat Completions API:
REST
curl -X POST \ -H "Authorization: Bearer $( gcloud auth print-access-token ) " \ -H "Content-Type: application/json" \ https:// ${ LOCATION } -aiplatform.googleapis.com/v1beta1/projects/ ${ PROJECT_ID } /locations/ ${ LOCATION } /endpoints/openapi/chat/completions \ -d '{ "model": "google/${MODEL_ID}", "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Send a prompt and an image to the Gemini API in Vertex AI
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Call a self-deployed model with the Chat Completions API
The following sample shows you how to send non-streaming requests:
REST
curl -X POST \ -H "Authorization: Bearer $( gcloud auth print-access-token ) " \ -H "Content-Type: application/json" \ https://aiplatform.googleapis.com/v1beta1/projects/ ${ PROJECT_ID } /locations/global/endpoints/ ${ ENDPOINT } /chat/completions \ -d '{ "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
The following sample shows you how to send streaming requests to a self-deployed model by using the Chat Completions API:
REST
curl -X POST \ -H "Authorization: Bearer $( gcloud auth print-access-token ) " \ -H "Content-Type: application/json" \ https://aiplatform.googleapis.com/v1beta1/projects/ ${ PROJECT_ID } /locations/global/endpoints/ ${ ENDPOINT } /chat/completions \ -d '{ "stream": true, "messages": [{ "role": "user", "content": "Write a story about a magic backpack." }] }'
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
extra_body
examples
You can use either the SDK or the REST API to pass in extra_body
.
Add thought_tag_marker
{
...
,
"extra_body"
:
{
"google"
:
{
...
,
"thought_tag_marker"
:
"..."
}
}
}
Add extra_body
using the SDK
clie
nt
.cha
t
.comple
t
io
ns
.crea
te
(
...
,
ex
tra
_body
=
{
'ex
tra
_body'
:
{
'google'
:
{
...
}
}
},
)
extra_content
examples
You can populate this field by using the REST API directly.
extra_content
with string content
{
"messages"
:
[
{
"role"
:
"..."
,
"content"
:
"..."
,
"extra_content"
:
{
"google"
:
{
...
}
}
}
]
}
Per-message extra_content
{
"messages"
:
[
{
"role"
:
"..."
,
"content"
:
[
{
"type"
:
"..."
,
...
,
"extra_content"
:
{
"google"
:
{
...
}
}
}
]
}
}
Per-tool call extra_content
{
"messages"
:
[
{
"role"
:
"..."
,
"tool_calls"
:
[
{
...
,
"extra_content"
:
{
"google"
:
{
...
}
}
}
]
}
]
}
Sample curl
requests
You can use these curl
requests directly, rather than going through the SDK.
Use thinking_config
with extra_body
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https://us-central1-aiplatform.googleapis.com/v1/projects/ ${
PROJECT_ID
}
/locations/us-central1/endpoints/openapi/chat/completions
\
-d
'{ \
"model": "google/gemini-2.5-flash-preview-04-17", \
"messages": [ \
{ "role": "user", \
"content": [ \
{ "type": "text", \
"text": "Are there any primes number of the form n*ceil(log(n))" \
}] }], \
"extra_body": { \
"google": { \
"thinking_config": { \
"include_thoughts": true, "thinking_budget": 10000 \
}, \
"thought_tag_marker": "think" } }, \
"stream": true }'
Multimodal requests
The Chat Completions API supports a variety of multimodal input, including both audio and video.
Use image_url
to pass in image data
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https://us-central1-aiplatform.googleapis.com/v1/projects/ ${
PROJECT
}
/locations/us-central1/endpoints/openapi/chat/completions
\
-d
'{ \
"model": "google/gemini-2.0-flash-001", \
"messages": [{ "role": "user", "content": [ \
{ "type": "text", "text": "Describe this image" }, \
{ "type": "image_url", "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg" }] }] }'
Use input_audio
to pass in audio data
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https://us-central1-aiplatform.googleapis.com/v1/projects/ ${
PROJECT
}
/locations/us-central1/endpoints/openapi/chat/completions
\
-d
'{ \
"model": "google/gemini-2.0-flash-001", \
"messages": [ \
{ "role": "user", \
"content": [ \
{ "type": "text", "text": "Describe this: " }, \
{ "type": "input_audio", "input_audio": { \
"format": "audio/mp3", \
"data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }'
Structured output
You can use the response_format
parameter to get structured output.
Example using SDK
from
pydantic
import
BaseModel
from
openai
import
OpenAI
client
=
OpenAI
()
class
CalendarEvent
(
BaseModel
):
name
:
str
date
:
str
participants
:
list
[
str
]
completion
=
client
.
beta
.
chat
.
completions
.
parse
(
model
=
"google/gemini-2.5-flash-preview-04-17"
,
messages
=
[
{
"role"
:
"system"
,
"content"
:
"Extract the event information."
},
{
"role"
:
"user"
,
"content"
:
"Alice and Bob are going to a science fair on Friday."
},
],
response_format
=
CalendarEvent
,
)
print
(
completion
.
choices
[
0
]
.
message
.
parsed
)
What's next
- See examples of calling the Inference API with the OpenAI-compatible syntax.
- See examples of calling the Function Calling API with OpenAI-compatible syntax.
- Learn more about the Gemini API .
- Learn more about migrating from Azure OpenAI to the Gemini API .