OpenAI models are available for use as managed APIs and self-deployed models on Vertex AI. You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.
Managed OpenAI models
OpenAI models offer fully managed and serverless models as APIs. To use a OpenAI model on Vertex AI, send a request directly to the Vertex AI API endpoint. When using OpenAI models as a managed API, there's no need to provision or manage infrastructure.
The following models are available from OpenAI to use in Vertex AI. To access an OpenAI model, go to its Model Garden model card.
gpt-oss 120B
OpenAI gpt-oss 120B is a 120B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases. The model is optimized for deployment on consumer hardware.
The 120B model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running on a single 80GB GPU.
Go to the gpt-oss 120B model card
gpt-oss 20B
OpenAI gpt-oss 20B is a 20B open-weight language model released under the Apache 2.0 license. It is well-suited for reasoning and function calling use cases. The model is optimized for deployment on consumer hardware.
The 20B model delivers similar results to OpenAI o3-mini on common benchmarks and can run on edge devices with 16GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.
Go to the gpt-oss 20B model card
Use OpenAI models
For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names. To learn how to make streaming and non-streaming calls to OpenAI models, see Call open model APIs .
To use a self-deployed Vertex AI model:
- Navigate to the Model Garden console .
- Find the relevant Vertex AI model.
- Click Enableand complete the provided form to get the necessary commercial use licenses.
For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .
What's next
- Learn how to Call open model APIs .

