GLM models

GLM models on Vertex AI offer fully managed and serverless models as APIs. To use a GLM model on Vertex AI, send a request directly to the Vertex AI API endpoint. Because GLM models use a managed API, there's no need to provision or manage infrastructure.

You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.

GLM 4.7

GLM 4.7 is a model from GLM designed for core or vibe coding, tool use, and complex reasoning.

Go to the GLM 4.7 model card

Use GLM models

For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names:

For GLM 4.7, use glm-4.7-maas

To learn how to make streaming and non-streaming calls to GLM models, see Call open model APIs .

To use a self-deployed Vertex AI model:

Navigate to the Model Garden console .
Find the relevant Vertex AI model.
Click Enable and complete the provided form to get the necessary commercial use licenses.

For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .

GLM model region availability

GLM models are available in the following regions:

Model

Regions

GLM 4.7

global

Max output: 128,000
Context length: 200,000

What's next

Learn how to Call open model APIs .

GLM models Stay organized with collections Save and categorize content based on your preferences.

GLM 4.7

Use GLM models

GLM model region availability

What's next

GLM models