Reranking for Vertex AI RAG Engine

The page explains reranking and types of rankers. The page also demonstrates how to use the Vertex AI ranking API to rerank your retrieved responses.

Available rerankers

Ranker options Description Latency Accuracy Pricing
Vertex AI ranking API
The Vertex AI ranking API is a standalone semantic reranker designed for highly-precise relevance scoring and low latency.

For more information about Vertex AI ranking API, see Improve search and RAG quality with ranking API .
Very low (less than 100 milliseconds) State-of-the-art performance Per Vertex AI RAG Engine request
LLM reranker
LLM reranker uses a separate call to Gemini to assess relevance of chunks to a query. High (1 to 2 seconds) Model dependent LLM token pricing

Use the Vertex AI ranking API

To use the Vertex AI ranking API, you must enable the Discovery Engine API. All supported models can be found in the Improve search and RAG quality with ranking API .

These code samples demonstrate how to enable reranking with the Vertex AI ranking API in the tool configuration.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python . For more information, see the Python API reference documentation .

Replace the following variables used in the sample code:

  • PROJECT_ID : The ID of your Google Cloud project.
  • LOCATION : The region to process the request.
  • MODEL_NAME : LLM model for content generation. For example, gemini-2.0-flash .
  • INPUT_PROMPT : The text sent to the LLM for content generation.
  • RAG_CORPUS_RESOURCE : The name of the RAG corpus resource.
    Format: projects/{project}/locations/{location}/ragCorpora/{rag_corpus} .
  • SIMILARITY_TOP_K : Optional: The number of top contexts to retrieve.
  • RANKER_MODEL_NAME : The name of the model used for reranking. For example, semantic-ranker-default@latest .
  from 
  
 vertexai 
  
 import 
 rag 
 from 
  
 vertexai.generative_models 
  
 import 
  GenerativeModel 
 
 , 
  Tool 
 
 import 
  
  vertexai 
 
 PROJECT_ID 
 = 
 " PROJECT_ID 
" 
 CORPUS_NAME 
 = 
 "projects/ 
 {PROJECT_ID} 
 /locations/ LOCATION 
/ragCorpora/ RAG_CORPUS_RESOURCE 
" 
 # Initialize Vertex AI API once per session 
  vertexai 
 
 . 
 init 
 ( 
 project 
 = 
 PROJECT_ID 
 , 
 location 
 = 
 " LOCATION 
" 
 ) 
 config 
 = 
 rag 
 . 
 RagRetrievalConfig 
 ( 
 top_k 
 = 
 10 
 , 
 ranking 
 = 
 rag 
 . 
 Ranking 
 ( 
 rank_service 
 = 
 rag 
 . 
 RankService 
 ( 
 model_name 
 = 
  RANKER_MODEL_NAME 
 
 ) 
 ) 
 ) 
 rag_retrieval_tool 
 = 
  Tool 
 
 . 
 from_retrieval 
 ( 
 retrieval 
 = 
 rag 
 . 
 Retrieval 
 ( 
 source 
 = 
 rag 
 . 
 VertexRagStore 
 ( 
 rag_resources 
 = 
 [ 
 rag 
 . 
 RagResource 
 ( 
 rag_corpus 
 = 
 CORPUS_NAME 
 , 
 ) 
 ], 
 rag_retrieval_config 
 = 
 config 
 ), 
 ) 
 ) 
 rag_model 
 = 
 GenerativeModel 
 ( 
 model_name 
 = 
 " MODEL_NAME 
" 
 , 
 tools 
 = 
 [ 
 rag_retrieval_tool 
 ] 
 ) 
 response 
 = 
 rag_model 
 . 
  generate_content 
 
 ( 
 " INPUT_PROMPT 
" 
 ) 
 print 
 ( 
 response 
 . 
 text 
 ) 
 # Example response: 
 #   The sky appears blue due to a phenomenon called Rayleigh scattering. 
 #   Sunlight, which contains all colors of the rainbow, is scattered 
 #   by the tiny particles in the Earth's atmosphere.... 
 #   ... 
 

REST

To generate content using Gemini models, make a call to the Vertex AI GenerateContent API. By specifying the RAG_CORPUS_RESOURCE when you make the request, the model automatically retrieves data from the Vertex AI RAG Engine.

Replace the following variables used in the sample code:

  • PROJECT_ID : The ID of your Google Cloud project.
  • LOCATION : The region to process the request.
  • MODEL_NAME : LLM model for content generation. For example, gemini-2.0-flash .
  • GENERATION_METHOD : LLM method for content generation. Options include generateContent and streamGenerateContent .
  • INPUT_PROMPT : The text sent to the LLM for content generation.
  • RAG_CORPUS_RESOURCE : The name of the RAG corpus resource.
    Format: projects/{project}/locations/{location}/ragCorpora/{rag_corpus} .
  • SIMILARITY_TOP_K : Optional: The number of top contexts to retrieve.
  • RANKER_MODEL_NAME : The name of the model used for reranking. For example, semantic-ranker-default@latest .
 curl  
-X  
POST  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
 "https://LOCATION-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/publishers/google/models/ MODEL_NAME 
: GENERATION_METHOD 
" 
  
 \ 
-d  
 '{ 
 "contents": { 
 "role": "user", 
 "parts": { 
 "text": " INPUT_PROMPT 
" 
 } 
 }, 
 "tools": { 
 "retrieval": { 
 "disable_attribution": false, 
 "vertex_rag_store": { 
 "rag_resources": { 
 "rag_corpus": " RAG_CORPUS_RESOURCE 
" 
 }, 
 "rag_retrieval_config": { 
 "top_k": SIMILARITY_TOP_K 
, 
 "ranking": { 
 "rank_service": { 
 "model_name": " RANKER_MODEL_NAME 
" 
 } 
 } 
 } 
 } 
 } 
 } 
 }' 
 

Use the LLM reranker in Vertex AI RAG Engine

This section presents the prerequisites and code samples for using an LLM reranker.

The LLM reranker supports only Gemini models, which are accessible when the Vertex AI RAG Engine API is enabled. To view the list of supported models, see Gemini models .

To retrieve relevant contexts using the Vertex AI RAG Engine API, do the following:

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python . For more information, see the Python API reference documentation .

Replace the following variables used in the code sample:

  • PROJECT_ID : The ID of your Google Cloud project.
  • LOCATION : The region to process the request.
  • RAG_CORPUS_RESOURCE : The name of the RAG corpus resource. Format: projects/{project}/locations/{location}/ragCorpora/{rag_corpus} .
  • TEXT : The query text to get relevant contexts.
  • MODEL_NAME : The name of the model used for reranking.
  from 
  
 vertexai 
  
 import 
 rag 
 import 
  
  vertexai 
 
 PROJECT_ID 
 = 
 " PROJECT_ID 
" 
 CORPUS_NAME 
 = 
 "projects/[PROJECT_ID]/locations/ LOCATION 
/ragCorpora/ RAG_CORPUS_RESOURCE 
" 
 MODEL_NAME 
 = 
 " MODEL_NAME 
" 
 # Initialize Vertex AI API once per session 
  vertexai 
 
 . 
 init 
 ( 
 project 
 = 
 PROJECT_ID 
 , 
 location 
 = 
 " LOCATION 
" 
 ) 
 rag_retrieval_config 
 = 
 rag 
 . 
 RagRetrievalConfig 
 ( 
 top_k 
 = 
 10 
 , 
 ranking 
 = 
 rag 
 . 
 Ranking 
 ( 
 llm_ranker 
 = 
 rag 
 . 
 LlmRanker 
 ( 
 model_name 
 = 
 MODEL_NAME 
 ) 
 ) 
 ) 
 response 
 = 
 rag 
 . 
 retrieval_query 
 ( 
 rag_resources 
 = 
 [ 
 rag 
 . 
 RagResource 
 ( 
 rag_corpus 
 = 
 CORPUS_NAME 
 , 
 ) 
 ], 
 text 
 = 
 " TEXT 
" 
 , 
 rag_retrieval_config 
 = 
 rag_retrieval_config 
 , 
 ) 
 print 
 ( 
 response 
 ) 
 # Example response: 
 # contexts { 
 #   contexts { 
 #     source_uri: "gs://your-bucket-name/file.txt" 
 #     text: ".... 
 #   .... 
 

REST

Replace the following variables used in the code sample:

  • PROJECT_ID : The ID of your Google Cloud project.
  • LOCATION : The region to process the request.
  • RAG_CORPUS_RESOURCE : The name of the RAG corpus resource. Format: projects/{project}/locations/{location}/ragCorpora/{rag_corpus} .
  • TEXT : The query text to get relevant contexts.
  • MODEL_NAME : The name of the model used for reranking.
 curl  
-X  
POST  
 \ 
  
-H  
 "Content-Type: application/json" 
  
 \ 
  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
 "https://LOCATION-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
:retrieveContexts" 
  
 \ 
  
-d  
 '{ 
 "vertex_rag_store": { 
 "rag_resources": { 
 "rag_corpus": " RAG_CORPUS_RESOURCE 
" 
 } 
 }, 
 "query": { 
 "text": " TEXT 
", 
 "rag_retrieval_config": { 
 "top_k": 10, 
 "ranking": { 
 "llm_ranker": { 
 "model_name": " MODEL_NAME 
" 
 } 
 } 
 } 
 } 
 }' 
 

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: