Vertex AI RAG Engine supported models

This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support Vertex AI RAG Engine.

Gemini models

The following models support Vertex AI RAG Engine:

Fine-tuned Gemini models are unsupported when the Gemini models use Vertex AI RAG Engine.

Self-deployed models

Vertex AI RAG Engine supports all models in Model Garden .

Use Vertex AI RAG Engine with your self-deployed open model endpoints.

Replace the variables used in the code sample:

  • PROJECT_ID : Your project ID.
  • LOCATION : The region to process your request.
  • ENDPOINT_ID : Your endpoint ID.

      # Create a model instance with your self-deployed open model endpoint 
     rag_model 
     = 
     GenerativeModel 
     ( 
     "projects/ PROJECT_ID 
    /locations/ LOCATION 
    /endpoints/ ENDPOINT_ID 
    " 
     , 
     tools 
     = 
     [ 
     rag_retrieval_tool 
     ] 
     ) 
     
    

Models with managed APIs on Vertex AI

The models with managed APIs on Vertex AI that support Vertex AI RAG Engine include the following:

The following code sample demonstrates how to use the Gemini GenerateContent API to create a generative model instance. The model ID, /publisher/meta/models/llama-3.1-405B-instruct-maas , is found in the model card .

Replace the variables used in the code sample:

  • PROJECT_ID : Your project ID.
  • LOCATION : The region to process your request.
  • RAG_RETRIEVAL_TOOL : Your RAG retrieval tool.

      # Create a model instance with Llama 3.1 MaaS endpoint 
     rag_model 
     = 
     GenerativeModel 
     ( 
     "projects/ PROJECT_ID 
    /locations/ LOCATION 
    /publisher/meta/models/llama-3.1-405B-instruct-maas" 
     , 
     tools 
     = 
      RAG_RETRIEVAL_TOOL 
     
     ) 
     
    

The following code sample demonstrates how to use the OpenAI compatible ChatCompletions API to generate a model response.

Replace the variables used in the code sample:

  • PROJECT_ID : Your project ID.
  • LOCATION : The region to process your request.
  • MODEL_ID : LLM model for content generation. For example, meta/llama-3.1-405b-instruct-maas .
  • INPUT_PROMPT : The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
  • RAG_CORPUS_ID : The ID of the RAG corpus resource.
  • ROLE : Your role.
  • USER : Your username.
  • CONTENT : Your content.

      # Generate a response with Llama 3.1 MaaS endpoint 
     response 
     = 
     client 
     . 
     chat 
     . 
     completions 
     . 
     create 
     ( 
     model 
     = 
     " MODEL_ID 
    " 
     , 
     messages 
     = 
     [{ 
     " ROLE 
    " 
     : 
     " USER 
    " 
     , 
     "content" 
     : 
     " CONTENT 
    " 
     }], 
     extra_body 
     = 
     { 
     "extra_body" 
     : 
     { 
     "google" 
     : 
     { 
     "vertex_rag_store" 
     : 
     { 
     "rag_resources" 
     : 
     { 
     "rag_corpus" 
     : 
     " RAG_CORPUS_ID 
    " 
     }, 
     "similarity_top_k" 
     : 
     10 
     } 
     } 
     } 
     }, 
     ) 
     
    

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: