Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, seeModel versions and lifecycle.
Stay organized with collectionsSave and categorize content based on your preferences.
This page describes the Vertex AI RAG Engine pricing and billing based on the Vertex AI RAG Engine components you use, such as models, reranking, and vector storage.
Vertex AI RAG Engine supports ingesting data from different data sources. For example, uploading local files, Cloud Storage, and Google Drive. Accessing files in these data sources from Vertex AI RAG Engine is free, but these data sources might charge for data transfer. For example, data egress costs.
LLM Parser: Vertex AI RAG Engine uses the LLM model that you specified to parse your file, and you will see and pay LLM model costs directly from your project.
Document AI layout parser: Vertex AI RAG Engine uses the Document AI layout parser that you specified to process your file, and you will see and pay for the use of the Document AI layout parser directly from your project.
Vertex AI RAG Engine orchestrates the embedding generation using the embedding model that you specified, and your project is billed for the costs associated with that model.
RAG Engine supports two categories of vector databases for vector search:
RAG-managed database
Bring-Your-Own vector database
A RAG-managed database has two purposes:
A RAG-managed database
stores RAG resources, such as RAG corpora and RAG files. File contents
are excluded.
Upon your choice, embedding indexing and
retrieval for vector search.
A RAG-managed database uses a Spanner instance as the backend.
For each of your projects, Vertex AI RAG Engine
provisions a customer-specific Google Cloud project and manages
RAG-managed resources that are stored in
Vertex AI RAG Engine, so that your data is physically
isolated.
If you choose theRagManagedDBBasic tier
or Scaled tier, Vertex AI RAG Engine provisions a
Spanner Enterprise edition instance in the corresponding
project:
Basic tier: 100 processing units with backup
Scaled tier: Starting at 1 node (1,000 processing units) and
autoscaling up to 10 nodes with backup
If any RAG corpus in your project chooses to use a RAG-managed
database for the vector search, you will be charged for the RAG-managed
Spanner instance.
Vertex AI RAG Engine surfaces Spanner costs from your corresponding RAG-managed project to your Google Cloud project, so that you can see and pay Spanner instance costs.
The following ranking tools are supported post retrieval:
LLM Reranker: Vertex AI RAG Engine uses the
LLM model that you specified to rerank the retrieval results, and you
will see and pay LLM model costs directly from your project.
Vertex AI Search ranking API:
Vertex AI RAG Engine uses the
Vertex AI Search ranking API to rerank the retrieval
results, and you will see and pay for theRanking APIdirectly from your project.
What's next
To learn how to use the Vertex AI SDK to run
Vertex AI RAG Engine tasks, seeRAG quickstart for
Python.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[],[],null,["# Vertex AI RAG Engine billing\n\n| The [VPC-SC security controls](/vertex-ai/generative-ai/docs/security-controls) and\n| CMEK are supported by Vertex AI RAG Engine. Data residency and AXT security controls aren't\n| supported.\n\nThis page describes the Vertex AI RAG Engine pricing and billing based on the Vertex AI RAG Engine components you use, such as models, reranking, and vector storage.\n\nFor more information, see the [Vertex AI RAG Engine overview](/vertex-ai/generative-ai/docs/rag-engine/rag-overview) page.\n\nPricing and billing\n-------------------\n\nVertex AI RAG Engine is free to use. However, if you configure\nVertex AI RAG Engine components, the billing might be affected.\n\nThis table explains how billing works when you use the RAG components.\n\nWhat's next\n-----------\n\n- To learn how to use the Vertex AI SDK to run Vertex AI RAG Engine tasks, see [RAG quickstart for\n Python](/vertex-ai/generative-ai/docs/rag-quickstart).\n- To learn about grounding, see [Grounding\n overview](/vertex-ai/generative-ai/docs/grounding/overview).\n- To learn more about the responses from RAG, see [Retrieval and Generation Output of Vertex AI RAG Engine](/vertex-ai/generative-ai/docs/model-reference/rag-output-explained).\n- To learn about the RAG architecture:\n - [Infrastructure for a RAG-capable generative AI application using Vertex AI and Vector Search](/architecture/gen-ai-rag-vertex-ai-vector-search)\n - [Infrastructure for a RAG-capable generative AI application using Vertex AI and AlloyDB for PostgreSQL](/architecture/rag-capable-gen-ai-app-using-vertex-ai)."]]