GLM models on Vertex AI offer fully managed and serverless models as APIs. To use a GLM model on Vertex AI, send a request directly to the Vertex AI API endpoint. Because GLM models use a managed API, there's no need to provision or manage infrastructure.
You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.
GLM 4.7
GLM 4.7 is a model from GLM designed for core or vibe coding, tool use, and complex reasoning.
Go to the GLM 4.7 model cardUse GLM models
For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names:
- For GLM 4.7, use
glm-4.7-maas
To learn how to make streaming and non-streaming calls to GLM models, see Call open model APIs .
To use a self-deployed Vertex AI model:
- Navigate to the Model Garden console .
- Find the relevant Vertex AI model.
- Click Enable and complete the provided form to get the necessary commercial use licenses.
For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .
GLM model region availability
GLM models are available in the following regions:
-
global - Max output: 128,000
- Context length: 200,000
What's next
Learn how to Call open model APIs .

