To use the OpenAI Python libraries, install the OpenAI SDK:
pip
install
openai
To authenticate with the Chat Completions API, you can either modify your client setup or change your environment configuration to use Google authentication and a Vertex AI endpoint. Choose whichever method that's easier, and follow the steps for setting up depending on whether you want to call Gemini models or self-deployed Model Garden models.
Certain models in Model Garden and supported Hugging Face models
need to be deployed to a Vertex AI endpoint
first before they can serve requests.
When
calling these self-deployed models from the Chat Completions API, you need to
specify the endpoint ID. To list your
existing Vertex AI endpoints, use the gcloud ai endpoints list
command
.
Client setup
To programmatically get Google credentials in Python, you can use the google-auth
Python SDK:
pip
install
google-auth
requests
Python
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries . For more information, see the Vertex AI Python API reference documentation .
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
By default, service account access tokens last for 1 hour. You can extend the life of service account access tokens
or periodically refresh your token and update the openai.api_key
variable.
Environment variables
Install
the Google Cloud CLI. The OpenAI library can
read the OPENAI_API_KEY
and OPENAI_BASE_URL
environment
variables to change the authentication and endpoint in their default client.
Set the following variables:
$
export
PROJECT_ID
=
PROJECT_ID
$
export
LOCATION
=
LOCATION
$
export
OPENAI_API_KEY
=
"
$(
gcloud
auth
application-default
print-access-token )
"
To call a Gemini model, set the MODEL_ID
variable and use the openapi
endpoint:
$
export
MODEL_ID
=
MODEL_ID
$
export
OPENAI_BASE_URL
=
"https://
${
LOCATION
}
-aiplatform.googleapis.com/v1beta1/projects/
${
PROJECT_ID
}
/locations/
${
LOCATION
}
/endpoints/openapi"
To call a self-deployed model from Model Garden, set the ENDPOINT
variable and use that in your URL instead:
$
export
ENDPOINT
=
ENDPOINT_ID
$
export
OPENAI_BASE_URL
=
"https://
${
LOCATION
}
-aiplatform.googleapis.com/v1beta1/projects/
${
PROJECT_ID
}
/locations/
${
LOCATION
}
/endpoints/
${
ENDPOINT
}
"
Next, initialize the client:
client
=
openai
.
OpenAI
()
The Gemini Chat Completions API uses OAuth to authenticate
with a short-lived access token
.
By default, service account access tokens last for 1 hour. You can extend the life of service account access tokens
or periodically refresh your token and update the openai.api_key
variable.
Refresh your credentials
The following example shows how to refresh your credentials automatically as needed:
Python
What's next
- See examples of calling the Chat Completions API with the OpenAI-compatible syntax.
- See examples of calling the Inference API with the OpenAI-compatible syntax.
- See examples of calling the Function Calling API with OpenAI-compatible syntax.
- Learn more about the Gemini API .
- Learn more about migrating from Azure OpenAI to the Gemini API .

