Vertex AI documentation is no longer being updated

Vertex AI's services are now part of Gemini Enterprise Agent Platform. See the most up-to-date information in the Agent Platform documentation .

Use open models using Model as a Service (MaaS)

This document describes how to use open models through Model as a Service (MaaS) on Vertex AI. MaaS provides serverless access to selected partner and open-source models, eliminating the need to provision or manage infrastructure.

Model Garden is a centralized library of AI and ML models from Google, Google Partners, and open models (open-weight and open-source), including MaaS models. Model Garden provides multiple ways to deploy available models on Vertex AI, including models from Hugging Face .

For more information about MaaS, see the partner models documentation .

Before you begin

To use MaaS models, you must enable the Vertex AI API in your Google Cloud project.

 gcloud  
services  
 enable 
  
aiplatform.googleapis.com

Enable the model's API

Before you can use a MaaS model, you must enable its API. To do this, go to the model page in Model Garden. Some models available through MaaS are also available for self-deployment. The Model Garden model cards for both offerings differ. The MaaS model card includes API Servicein its name.

Call the model using the Google Gen AI SDK for Python

The following example calls the Llama 3.3 model using the Google Gen AI SDK for Python.

  from 
  
 google 
  
 import 
 genai 
 from 
  
 google.genai 
  
 import 
 types 
 PROJECT_ID 
 = 
 " PROJECT_ID 
" 
 LOCATION 
 = 
 " LOCATION 
" 
 MODEL 
 = 
 "meta/llama-3.3-70b-instruct-maas" 
 # The model ID from Model Garden with "API Service" 
 # Define the prompt to send to the model. 
 prompt 
 = 
 "What is the distance between earth and moon?" 
 # Initialize the Google Gen AI SDK client. 
 client 
 = 
 genai 
 . 
 Client 
 ( 
 vertexai 
 = 
 True 
 , 
 project 
 = 
 PROJECT_ID 
 , 
 location 
 = 
 LOCATION 
 , 
 ) 
 # Prepare the content for the chat. 
 contents 
 : 
 types 
 . 
 ContentListUnion 
 = 
 [ 
 types 
 . 
 Content 
 ( 
 role 
 = 
 "user" 
 , 
 parts 
 = 
 [ 
 types 
 . 
 Part 
 . 
 from_text 
 ( 
 text 
 = 
 prompt 
 ) 
 ] 
 ) 
 ] 
 # Configure generation parameters. 
 generate_content_config 
 = 
 types 
 . 
 GenerateContentConfig 
 ( 
 temperature 
 = 
 0 
 , 
 top_p 
 = 
 0 
 , 
 max_output_tokens 
 = 
 4096 
 , 
 ) 
 try 
 : 
 # Create a chat instance with the specified model. 
 chat 
 = 
 client 
 . 
 chats 
 . 
 create 
 ( 
 model 
 = 
 MODEL 
 ) 
 # Send the message and print the response. 
 response 
 = 
 chat 
 . 
 send_message 
 ( 
 contents 
 ) 
 print 
 ( 
 response 
 . 
 text 
 ) 
 except 
 Exception 
 as 
 e 
 : 
 print 
 ( 
 f 
 " 
 { 
 MODEL 
 } 
 call failed due to 
 { 
 e 
 } 
 " 
 )

Vertex AI documentation is no longer being updated

Use open models using Model as a Service (MaaS) Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Enable the model's API

Call the model using the Google Gen AI SDK for Python

What's next

Use open models using Model as a Service (MaaS)