This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support Vertex AI RAG Engine.
Gemini models
The following models support Vertex AI RAG Engine:
- Gemini 2.5 Flash (Preview)
- Gemini 2.5 Flash-Lite (Preview)
- Gemini 2.5 Flash-Lite
- Gemini 2.5 Pro
- Gemini 2.5 Flash
- Gemini 2.0 Flash
Fine-tuned Gemini models are unsupported when the Gemini models use Vertex AI RAG Engine.
Self-deployed models
Vertex AI RAG Engine supports all models in Model Garden .
Use Vertex AI RAG Engine with your self-deployed open model endpoints.
Replace the variables used in the code sample:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process your request.
-  ENDPOINT_ID : Your endpoint ID. # Create a model instance with your self-deployed open model endpoint rag_model = GenerativeModel ( "projects/ PROJECT_ID /locations/ LOCATION /endpoints/ ENDPOINT_ID " , tools = [ rag_retrieval_tool ] )
Models with managed APIs on Vertex AI
The models with managed APIs on Vertex AI that support Vertex AI RAG Engine include the following:
The following code sample demonstrates how to use the Gemini GenerateContent 
API to create a generative model instance. The model ID, /publisher/meta/models/llama-3.1-405B-instruct-maas 
, is found in the model card 
.
Replace the variables used in the code sample:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process your request.
-  RAG_RETRIEVAL_TOOL : Your RAG retrieval tool. # Create a model instance with Llama 3.1 MaaS endpoint rag_model = GenerativeModel ( "projects/ PROJECT_ID /locations/ LOCATION /publisher/meta/models/llama-3.1-405B-instruct-maas" , tools = RAG_RETRIEVAL_TOOL )
The following code sample demonstrates how to use the OpenAI compatible ChatCompletions 
API to generate a model response.
Replace the variables used in the code sample:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process your request.
-   MODEL_ID 
: LLM model for content generation. For
example, meta/llama-3.1-405b-instruct-maas.
- INPUT_PROMPT : The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
- RAG_CORPUS_ID : The ID of the RAG corpus resource.
- ROLE : Your role.
- USER : Your username.
-  CONTENT : Your content. # Generate a response with Llama 3.1 MaaS endpoint response = client . chat . completions . create ( model = " MODEL_ID " , messages = [{ " ROLE " : " USER " , "content" : " CONTENT " }], extra_body = { "extra_body" : { "google" : { "vertex_rag_store" : { "rag_resources" : { "rag_corpus" : " RAG_CORPUS_ID " }, "similarity_top_k" : 10 } } } }, )

