Qwen models

Qwen models are available for use as managed APIs and self-deployed models on Vertex AI. You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.

Managed Qwen models

The following models are available from Qwen to use in Vertex AI. To access a Qwen model, go to its Model Garden model card.

Qwen3-Next-80B Instruct

Qwen3-Next-80B Instruct is a language model from the Qwen3-Next family of models. It is designed for following specific commands and handling very long pieces of text. It uses a smart design called Mixture-of-Experts (MoE), which activates a subset of available parameters to process information, which makes it faster and more cost-effective to run than other models of its size.

The Instruct version is tuned for reliable, direct answers in chat and agent applications and its large context window allows it to maintain an entire conversation or large document in memory.

Go to the Qwen3-Next-80B Instruct model card

Qwen3-Next-80B Thinking

Qwen3-Next-80B Thinking is a language model from the Qwen3-Next family of models. It is specialized for complex problem-solving and deep reasoning. Its "thinking" mode generates a visible, step-by-step reasoning process alongside the final answer, making it ideal for tasks requiring transparent logic, like mathematical proofs, intricate code debugging, or multi-step agent planning.

Go to the Qwen3-Next-80B Thinking model card

Qwen3 Coder (Qwen3 Coder)

Qwen3 Coder ( Qwen3 Coder ) is a large-scale, open-weight model developed for advanced software development tasks. The model's key feature is its large context window, allowing it to process and understand large codebases comprehensively.

Go to the Qwen3 Coder model card

Qwen3 235B (Qwen3 235B)

Qwen3 235B ( Qwen3 235B ) is a large 235B parameter model. The model is distinguished by its "hybrid thinking" capability, which allows users to dynamically switch between a methodical, step-by-step "thinking" mode for complex tasks like mathematical reasoning and coding, and a rapid "non-thinking" mode for general-purpose conversation. Its large context window makes it suitable for use cases requiring deep reasoning and long-form comprehension.

Go to the Qwen3 235B model card

Use Qwen models

For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names. To learn how to make streaming and non-streaming calls to Qwen models, see Call open model APIs .

To use a self-deployed Vertex AI model:

Navigate to the Model Garden console .
Find the relevant Vertex AI model.
Click Enableand complete the provided form to get the necessary commercial use licenses.

For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .

What's next

Learn how to Call open model APIs .