To use the OpenAI Python libraries, install the OpenAI SDK:
 pip  
install  
openai 
 
To authenticate with the Chat Completions API, you can either modify your client setup or change your environment configuration to use Google authentication and a Vertex AI endpoint. Choose whichever method that's easier, and follow the steps for setting up depending on whether you want to call Gemini models or self-deployed Model Garden models.
Certain models in Model Garden and supported Hugging Face models 
need to be deployed to a Vertex AI endpoint 
first before they can serve requests.
When
calling these self-deployed models from the Chat Completions API, you need to
specify the endpoint ID. To list your
existing Vertex AI endpoints, use the  gcloud ai endpoints list 
command 
.
Client setup
To programmatically get Google credentials in Python, you can use the google-auth 
Python SDK:
 pip  
install  
google-auth  
requests 
 
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
By default, service account access tokens last for 1 hour. You can extend the life of service account access tokens 
or periodically refresh your token and update the openai.api_key 
variable.
Environment variables
 Install 
the Google Cloud CLI. The OpenAI library can
read the OPENAI_API_KEY 
and OPENAI_BASE_URL 
environment
variables to change the authentication and endpoint in their default client.
Set the following variables:
 $  
 export 
  
 PROJECT_ID 
 = 
 PROJECT_ID 
$  
 export 
  
 LOCATION 
 = 
 LOCATION 
$  
 export 
  
 OPENAI_API_KEY 
 = 
 " 
 $( 
gcloud  
auth  
application-default  
print-access-token ) 
 " 
 
 
To call a Gemini model, set the MODEL_ID 
variable and use the openapi 
endpoint:
 $  
 export 
  
 MODEL_ID 
 = 
 MODEL_ID 
$  
 export 
  
 OPENAI_BASE_URL 
 = 
 "https:// 
 ${ 
 LOCATION 
 } 
 -aiplatform.googleapis.com/v1beta1/projects/ 
 ${ 
 PROJECT_ID 
 } 
 /locations/ 
 ${ 
 LOCATION 
 } 
 /endpoints/openapi" 
 
 
To call a self-deployed model from Model Garden, set the ENDPOINT 
variable and use that in your URL instead:
 $  
 export 
  
 ENDPOINT 
 = 
 ENDPOINT_ID 
$  
 export 
  
 OPENAI_BASE_URL 
 = 
 "https:// 
 ${ 
 LOCATION 
 } 
 -aiplatform.googleapis.com/v1beta1/projects/ 
 ${ 
 PROJECT_ID 
 } 
 /locations/ 
 ${ 
 LOCATION 
 } 
 /endpoints/ 
 ${ 
 ENDPOINT 
 } 
 " 
 
 
Next, initialize the client:
  client 
 = 
 openai 
 . 
 OpenAI 
 () 
 
 
The Gemini Chat Completions API uses OAuth to authenticate
with a short-lived access token 
.
By default, service account access tokens last for 1 hour. You can extend the life of service account access tokens 
or periodically refresh your token and update the openai.api_key 
variable.
Refresh your credentials
The following example shows how to refresh your credentials automatically as needed:
Python
What's next
- See examples of calling the Chat Completions API with the OpenAI-compatible syntax.
- See examples of calling the Inference API with the OpenAI-compatible syntax.
- See examples of calling the Function Calling API with OpenAI-compatible syntax.
- Learn more about the Gemini API .
- Learn more about migrating from Azure OpenAI to the Gemini API .

