Vertex AI RAG Engine supported models

This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support Vertex AI RAG Engine.

Gemini models

The following models support Vertex AI RAG Engine:

Fine-tuned Gemini models are unsupported when the Gemini models use Vertex AI RAG Engine.

Self-deployed models

Vertex AI RAG Engine supports all models in Model Garden .

Use Vertex AI RAG Engine with your self-deployed open model endpoints.

Replace the variables used in the code sample:

PROJECT_ID : Your project ID.
LOCATION : The region to process your request.

ENDPOINT_ID : Your endpoint ID.

  # Create a model instance with your self-deployed open model endpoint 
 rag_model 
 = 
 GenerativeModel 
 ( 
 "projects/ PROJECT_ID 
/locations/ LOCATION 
/endpoints/ ENDPOINT_ID 
" 
 , 
 tools 
 = 
 [ 
 rag_retrieval_tool 
 ] 
 )

Models with managed APIs on Vertex AI

The models with managed APIs on Vertex AI that support Vertex AI RAG Engine include the following:

The following code sample demonstrates how to use the Gemini GenerateContent API to create a generative model instance. The model ID, /publisher/meta/models/llama-3.1-405B-instruct-maas , is found in the model card .

Replace the variables used in the code sample:

PROJECT_ID : Your project ID.
LOCATION : The region to process your request.

RAG_RETRIEVAL_TOOL : Your RAG retrieval tool.

  # Create a model instance with Llama 3.1 MaaS endpoint 
 rag_model 
 = 
 GenerativeModel 
 ( 
 "projects/ PROJECT_ID 
/locations/ LOCATION 
/publisher/meta/models/llama-3.1-405B-instruct-maas" 
 , 
 tools 
 = 
  RAG_RETRIEVAL_TOOL 
 
 )

The following code sample demonstrates how to use the OpenAI compatible ChatCompletions API to generate a model response.

Replace the variables used in the code sample:

PROJECT_ID : Your project ID.
LOCATION : The region to process your request.
MODEL_ID : LLM model for content generation. For example, meta/llama-3.1-405b-instruct-maas .
INPUT_PROMPT : The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
RAG_CORPUS_ID : The ID of the RAG corpus resource.
ROLE : Your role.
USER : Your username.

CONTENT : Your content.

  # Generate a response with Llama 3.1 MaaS endpoint 
 response 
 = 
 client 
 . 
 chat 
 . 
 completions 
 . 
 create 
 ( 
 model 
 = 
 " MODEL_ID 
" 
 , 
 messages 
 = 
 [{ 
 " ROLE 
" 
 : 
 " USER 
" 
 , 
 "content" 
 : 
 " CONTENT 
" 
 }], 
 extra_body 
 = 
 { 
 "extra_body" 
 : 
 { 
 "google" 
 : 
 { 
 "vertex_rag_store" 
 : 
 { 
 "rag_resources" 
 : 
 { 
 "rag_corpus" 
 : 
 " RAG_CORPUS_ID 
" 
 }, 
 "similarity_top_k" 
 : 
 10 
 } 
 } 
 } 
 }, 
 )

What's next

Use Embedding models with Vertex AI RAG Engine .