Examples

Call Gemini with the Chat Completions API

The following sample shows you how to send non-streaming requests:

REST

  
curl  
-X  
POST  
 \ 
  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
  
-H  
 "Content-Type: application/json" 
  
 \ 
  
https:// ${ 
 LOCATION 
 } 
-aiplatform.googleapis.com/v1beta1/projects/ ${ 
 PROJECT_ID 
 } 
/locations/ ${ 
 LOCATION 
 } 
/endpoints/openapi/chat/completions  
 \ 
  
-d  
 '{ 
 "model": "google/${MODEL_ID}", 
 "messages": [{ 
 "role": "user", 
 "content": "Write a story about a magic backpack." 
 }] 
 }' 
  

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  from 
  
 google.auth 
  
 import 
 default 
 import 
  
 google.auth.transport.requests 
 import 
  
 openai 
 # TODO(developer): Update and un-comment below lines 
 # project_id = "PROJECT_ID" 
 # location = "us-central1" 
 # Programmatically get an access token 
 credentials 
 , 
 _ 
 = 
 default 
 ( 
 scopes 
 = 
 [ 
 "https://www.googleapis.com/auth/cloud-platform" 
 ]) 
 credentials 
 . 
 refresh 
 ( 
 google 
 . 
 auth 
 . 
 transport 
 . 
 requests 
 . 
 Request 
 ()) 
 # OpenAI Client 
 client 
 = 
 openai 
 . 
 OpenAI 
 ( 
 base_url 
 = 
 f 
 "https:// 
 { 
 location 
 } 
 -aiplatform.googleapis.com/v1/projects/ 
 { 
 project_id 
 } 
 /locations/ 
 { 
 location 
 } 
 /endpoints/openapi" 
 , 
 api_key 
 = 
 credentials 
 . 
 token 
 , 
 ) 
 response 
 = 
 client 
 . 
 chat 
 . 
 completions 
 . 
 create 
 ( 
 model 
 = 
 "google/gemini-2.0-flash-001" 
 , 
 messages 
 = 
 [{ 
 "role" 
 : 
 "user" 
 , 
 "content" 
 : 
 "Why is the sky blue?" 
 }], 
 ) 
 print 
 ( 
 response 
 ) 
 

The following sample shows you how to send streaming requests to a Gemini model by using the Chat Completions API:

REST

  
curl  
-X  
POST  
 \ 
  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
  
-H  
 "Content-Type: application/json" 
  
 \ 
  
https:// ${ 
 LOCATION 
 } 
-aiplatform.googleapis.com/v1beta1/projects/ ${ 
 PROJECT_ID 
 } 
/locations/ ${ 
 LOCATION 
 } 
/endpoints/openapi/chat/completions  
 \ 
  
-d  
 '{ 
 "model": "google/${MODEL_ID}", 
 "stream": true, 
 "messages": [{ 
 "role": "user", 
 "content": "Write a story about a magic backpack." 
 }] 
 }' 
  

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  from 
  
 google.auth 
  
 import 
 default 
 import 
  
 google.auth.transport.requests 
 import 
  
 openai 
 # TODO(developer): Update and un-comment below lines 
 # project_id = "PROJECT_ID" 
 # location = "us-central1" 
 # Programmatically get an access token 
 credentials 
 , 
 _ 
 = 
 default 
 ( 
 scopes 
 = 
 [ 
 "https://www.googleapis.com/auth/cloud-platform" 
 ]) 
 credentials 
 . 
 refresh 
 ( 
 google 
 . 
 auth 
 . 
 transport 
 . 
 requests 
 . 
 Request 
 ()) 
 # OpenAI Client 
 client 
 = 
 openai 
 . 
 OpenAI 
 ( 
 base_url 
 = 
 f 
 "https:// 
 { 
 location 
 } 
 -aiplatform.googleapis.com/v1/projects/ 
 { 
 project_id 
 } 
 /locations/ 
 { 
 location 
 } 
 /endpoints/openapi" 
 , 
 api_key 
 = 
 credentials 
 . 
 token 
 , 
 ) 
 response 
 = 
 client 
 . 
 chat 
 . 
 completions 
 . 
 create 
 ( 
 model 
 = 
 "google/gemini-2.0-flash-001" 
 , 
 messages 
 = 
 [{ 
 "role" 
 : 
 "user" 
 , 
 "content" 
 : 
 "Why is the sky blue?" 
 }], 
 stream 
 = 
 True 
 , 
 ) 
 for 
 chunk 
 in 
 response 
 : 
 print 
 ( 
 chunk 
 ) 
 

Send a prompt and an image to the Gemini API in Vertex AI

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  from 
  
 google.auth 
  
 import 
 default 
 import 
  
 google.auth.transport.requests 
 import 
  
 openai 
 # TODO(developer): Update and un-comment below lines 
 # project_id = "PROJECT_ID" 
 # location = "us-central1" 
 # Programmatically get an access token 
 credentials 
 , 
 _ 
 = 
 default 
 ( 
 scopes 
 = 
 [ 
 "https://www.googleapis.com/auth/cloud-platform" 
 ]) 
 credentials 
 . 
 refresh 
 ( 
 google 
 . 
 auth 
 . 
 transport 
 . 
 requests 
 . 
 Request 
 ()) 
 # OpenAI Client 
 client 
 = 
 openai 
 . 
 OpenAI 
 ( 
 base_url 
 = 
 f 
 "https:// 
 { 
 location 
 } 
 -aiplatform.googleapis.com/v1/projects/ 
 { 
 project_id 
 } 
 /locations/ 
 { 
 location 
 } 
 /endpoints/openapi" 
 , 
 api_key 
 = 
 credentials 
 . 
 token 
 , 
 ) 
 response 
 = 
 client 
 . 
 chat 
 . 
 completions 
 . 
 create 
 ( 
 model 
 = 
 "google/gemini-2.0-flash-001" 
 , 
 messages 
 = 
 [ 
 { 
 "role" 
 : 
 "user" 
 , 
 "content" 
 : 
 [ 
 { 
 "type" 
 : 
 "text" 
 , 
 "text" 
 : 
 "Describe the following image:" 
 }, 
 { 
 "type" 
 : 
 "image_url" 
 , 
 "image_url" 
 : 
 "gs://cloud-samples-data/generative-ai/image/scones.jpg" 
 , 
 }, 
 ], 
 } 
 ], 
 ) 
 print 
 ( 
 response 
 ) 
 

Call a self-deployed model with the Chat Completions API

The following sample shows you how to send non-streaming requests:

REST

  
curl  
-X  
POST  
 \ 
  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
  
-H  
 "Content-Type: application/json" 
  
 \ 
  
https://aiplatform.googleapis.com/v1beta1/projects/ ${ 
 PROJECT_ID 
 } 
/locations/global/endpoints/ ${ 
 ENDPOINT 
 } 
/chat/completions  
 \ 
  
-d  
 '{ 
 "messages": [{ 
 "role": "user", 
 "content": "Write a story about a magic backpack." 
 }] 
 }' 

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  from 
  
 google.auth 
  
 import 
 default 
 import 
  
 google.auth.transport.requests 
 import 
  
 openai 
 # TODO(developer): Update and un-comment below lines 
 # project_id = "PROJECT_ID" 
 # location = "us-central1" 
 # model_id = "gemma-2-9b-it" 
 # endpoint_id = "YOUR_ENDPOINT_ID" 
 # Programmatically get an access token 
 credentials 
 , 
 _ 
 = 
 default 
 ( 
 scopes 
 = 
 [ 
 "https://www.googleapis.com/auth/cloud-platform" 
 ]) 
 credentials 
 . 
 refresh 
 ( 
 google 
 . 
 auth 
 . 
 transport 
 . 
 requests 
 . 
 Request 
 ()) 
 # OpenAI Client 
 client 
 = 
 openai 
 . 
 OpenAI 
 ( 
 base_url 
 = 
 f 
 "https:// 
 { 
 location 
 } 
 -aiplatform.googleapis.com/v1/projects/ 
 { 
 project_id 
 } 
 /locations/ 
 { 
 location 
 } 
 /endpoints/ 
 { 
 endpoint_id 
 } 
 " 
 , 
 api_key 
 = 
 credentials 
 . 
 token 
 , 
 ) 
 response 
 = 
 client 
 . 
 chat 
 . 
 completions 
 . 
 create 
 ( 
 model 
 = 
 model_id 
 , 
 messages 
 = 
 [{ 
 "role" 
 : 
 "user" 
 , 
 "content" 
 : 
 "Why is the sky blue?" 
 }], 
 ) 
 print 
 ( 
 response 
 ) 
 

The following sample shows you how to send streaming requests to a self-deployed model by using the Chat Completions API:

REST

  
curl  
-X  
POST  
 \ 
  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
  
-H  
 "Content-Type: application/json" 
  
 \ 
  
https://aiplatform.googleapis.com/v1beta1/projects/ ${ 
 PROJECT_ID 
 } 
/locations/global/endpoints/ ${ 
 ENDPOINT 
 } 
/chat/completions  
 \ 
  
-d  
 '{ 
 "stream": true, 
 "messages": [{ 
 "role": "user", 
 "content": "Write a story about a magic backpack." 
 }] 
 }' 
  

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  from 
  
 google.auth 
  
 import 
 default 
 import 
  
 google.auth.transport.requests 
 import 
  
 openai 
 # TODO(developer): Update and un-comment below lines 
 # project_id = "PROJECT_ID" 
 # location = "us-central1" 
 # model_id = "gemma-2-9b-it" 
 # endpoint_id = "YOUR_ENDPOINT_ID" 
 # Programmatically get an access token 
 credentials 
 , 
 _ 
 = 
 default 
 ( 
 scopes 
 = 
 [ 
 "https://www.googleapis.com/auth/cloud-platform" 
 ]) 
 credentials 
 . 
 refresh 
 ( 
 google 
 . 
 auth 
 . 
 transport 
 . 
 requests 
 . 
 Request 
 ()) 
 # OpenAI Client 
 client 
 = 
 openai 
 . 
 OpenAI 
 ( 
 base_url 
 = 
 f 
 "https:// 
 { 
 location 
 } 
 -aiplatform.googleapis.com/v1/projects/ 
 { 
 project_id 
 } 
 /locations/ 
 { 
 location 
 } 
 /endpoints/ 
 { 
 endpoint_id 
 } 
 " 
 , 
 api_key 
 = 
 credentials 
 . 
 token 
 , 
 ) 
 response 
 = 
 client 
 . 
 chat 
 . 
 completions 
 . 
 create 
 ( 
 model 
 = 
 model_id 
 , 
 messages 
 = 
 [{ 
 "role" 
 : 
 "user" 
 , 
 "content" 
 : 
 "Why is the sky blue?" 
 }], 
 stream 
 = 
 True 
 , 
 ) 
 for 
 chunk 
 in 
 response 
 : 
 print 
 ( 
 chunk 
 ) 
 

extra_body examples

You can use either the SDK or the REST API to pass in extra_body .

Add thought_tag_marker

  { 
  
 ... 
 , 
  
 "extra_body" 
 : 
  
 { 
  
 "google" 
 : 
  
 { 
  
 ... 
 , 
  
 "thought_tag_marker" 
 : 
  
 "..." 
  
 } 
  
 } 
 } 
 

Add extra_body using the SDK

  clie 
 nt 
 .cha 
 t 
 .comple 
 t 
 io 
 ns 
 .crea 
 te 
 ( 
  
 ... 
 , 
  
 ex 
 tra 
 _body 
  
 = 
  
 { 
  
 'ex 
 tra 
 _body' 
 : 
  
 { 
  
 'google' 
 : 
  
 { 
  
 ... 
  
 } 
  
 } 
  
 }, 
 ) 
 

extra_content examples

You can populate this field by using the REST API directly.

extra_content with string content

  { 
  
 "messages" 
 : 
  
 [ 
  
 { 
  
 "role" 
 : 
  
 "..." 
 , 
  
 "content" 
 : 
  
 "..." 
 , 
  
 "extra_content" 
 : 
  
 { 
  
 "google" 
 : 
  
 { 
  
 ... 
  
 } 
  
 } 
  
 } 
  
 ] 
 } 
 

Per-message extra_content

  { 
  
 "messages" 
 : 
  
 [ 
  
 { 
  
 "role" 
 : 
  
 "..." 
 , 
  
 "content" 
 : 
  
 [ 
  
 { 
  
 "type" 
 : 
  
 "..." 
 , 
  
 ... 
 , 
  
 "extra_content" 
 : 
  
 { 
  
 "google" 
 : 
  
 { 
  
 ... 
  
 } 
  
 } 
  
 } 
  
 ] 
  
 } 
 } 
 

Per-tool call extra_content

  { 
  
 "messages" 
 : 
  
 [ 
  
 { 
  
 "role" 
 : 
  
 "..." 
 , 
  
 "tool_calls" 
 : 
  
 [ 
  
 { 
  
 ... 
 , 
  
 "extra_content" 
 : 
  
 { 
  
 "google" 
 : 
  
 { 
  
 ... 
  
 } 
  
 } 
  
 } 
  
 ] 
  
 } 
  
 ] 
 } 
 

Sample curl requests

You can use these curl requests directly, rather than going through the SDK.

Use thinking_config with extra_body

 curl  
-X  
POST  
 \ 
  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
  
-H  
 "Content-Type: application/json" 
  
 \ 
  
https://us-central1-aiplatform.googleapis.com/v1/projects/ ${ 
 PROJECT_ID 
 } 
/locations/us-central1/endpoints/openapi/chat/completions  
 \ 
  
-d  
 '{ \ 
 "model": "google/gemini-2.5-flash-preview-04-17", \ 
 "messages": [ \ 
 { "role": "user", \ 
 "content": [ \ 
 { "type": "text", \ 
 "text": "Are there any primes number of the form n*ceil(log(n))" \ 
 }] }], \ 
 "extra_body": { \ 
 "google": { \ 
 "thinking_config": { \ 
 "include_thoughts": true, "thinking_budget": 10000 \ 
 }, \ 
 "thought_tag_marker": "think" } }, \ 
 "stream": true }' 
 

Multimodal requests

The Chat Completions API supports a variety of multimodal input, including both audio and video.

Use image_url to pass in image data

 curl  
-X  
POST  
 \ 
  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
  
-H  
 "Content-Type: application/json" 
  
 \ 
  
https://us-central1-aiplatform.googleapis.com/v1/projects/ ${ 
 PROJECT 
 } 
/locations/us-central1/endpoints/openapi/chat/completions  
 \ 
  
-d  
 '{ \ 
 "model": "google/gemini-2.0-flash-001", \ 
 "messages": [{ "role": "user", "content": [ \ 
 { "type": "text", "text": "Describe this image" }, \ 
 { "type": "image_url", "image_url": "gs://cloud-samples-data/generative-ai/image/scones.jpg" }] }] }' 
 

Use input_audio to pass in audio data

 curl  
-X  
POST  
 \ 
  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
  
-H  
 "Content-Type: application/json" 
  
 \ 
  
https://us-central1-aiplatform.googleapis.com/v1/projects/ ${ 
 PROJECT 
 } 
/locations/us-central1/endpoints/openapi/chat/completions  
 \ 
  
-d  
 '{ \ 
 "model": "google/gemini-2.0-flash-001", \ 
 "messages": [ \ 
 { "role": "user", \ 
 "content": [ \ 
 { "type": "text", "text": "Describe this: " }, \ 
 { "type": "input_audio", "input_audio": { \ 
 "format": "audio/mp3", \ 
 "data": "gs://cloud-samples-data/generative-ai/audio/pixel.mp3" } }] }] }' 
 

Structured output

You can use the response_format parameter to get structured output.

Example using SDK

  from 
  
 pydantic 
  
 import 
 BaseModel 
 from 
  
 openai 
  
 import 
 OpenAI 
 client 
 = 
 OpenAI 
 () 
 class 
  
 CalendarEvent 
 ( 
 BaseModel 
 ): 
 name 
 : 
 str 
 date 
 : 
 str 
 participants 
 : 
 list 
 [ 
 str 
 ] 
 completion 
 = 
 client 
 . 
 beta 
 . 
 chat 
 . 
 completions 
 . 
 parse 
 ( 
 model 
 = 
 "google/gemini-2.5-flash-preview-04-17" 
 , 
 messages 
 = 
 [ 
 { 
 "role" 
 : 
 "system" 
 , 
 "content" 
 : 
 "Extract the event information." 
 }, 
 { 
 "role" 
 : 
 "user" 
 , 
 "content" 
 : 
 "Alice and Bob are going to a science fair on Friday." 
 }, 
 ], 
 response_format 
 = 
 CalendarEvent 
 , 
 ) 
 print 
 ( 
 completion 
 . 
 choices 
 [ 
 0 
 ] 
 . 
 message 
 . 
 parsed 
 ) 
 

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: