This page describes the Vertex AI RAG Engine pricing and billing based on the Vertex AI RAG Engine components you use, such as models, reranking, and vector storage.
For more information, see the Vertex AI RAG Engine overview page.
Pricing and billing
Vertex AI RAG Engine is free to use. However, if you configure Vertex AI RAG Engine components, the billing might be affected.
This table explains how billing works when you use the RAG components.
- Default parser : Free.
- LLM Parser : Vertex AI RAG Engine uses the LLM model that you specified to parse your file, and you will see and pay LLM model costs directly from your project.
- Document AI layout parser : Vertex AI RAG Engine uses the Document AI layout parser that you specified to process your file, and you will see and pay for the use of the Document AI layout parser directly from your project.
For more pricing information, see Cost of building and deploying AI models in Vertex AI .
- RAG-managed database
- Bring-Your-Own vector database
A RAG-managed database has two purposes:
- A RAG-managed database stores RAG resources, such as RAG corpora and RAG files. File contents are excluded.
- Upon your choice, embedding indexing and retrieval for vector search.
A RAG-managed database uses a Spanner instance as the backend.
For each of your projects, Vertex AI RAG Engine provisions a customer-specific Google Cloud project and manages RAG-managed resources that are stored in Vertex AI RAG Engine, so that your data is physically isolated.
If you choose the RagManagedDB 
Basic tier
        or Scaled tier, Vertex AI RAG Engine provisions a
        Spanner Enterprise edition instance in the corresponding
        project:
- Basic tier : 100 processing units with backup
- Scaled tier : Starting at 1 node (1,000 processing units) and autoscaling up to 10 nodes with backup
If any RAG corpus in your project chooses to use a RAG-managed database for the vector search, you will be charged for the RAG-managed Spanner instance.
Vertex AI RAG Engine surfaces Spanner costs from your corresponding RAG-managed project to your Google Cloud project, so that you can see and pay Spanner instance costs.
For more pricing details on Spanner, see Spanner pricing .
- LLM Reranker : Vertex AI RAG Engine uses the LLM model that you specified to rerank the retrieval results, and you will see and pay LLM model costs directly from your project.
- Vertex AI Search ranking API : Vertex AI RAG Engine uses the Vertex AI Search ranking API to rerank the retrieval results, and you will see and pay for the Ranking API directly from your project.
Delete Vertex AI RAG Engine
The following code samples demonstrate how to delete a Vertex AI RAG Engine for the Google Cloud console, Python, and REST:
-  Version 1 (v1) API parameters and code samples . 
-  v1beta1 API parameters and code samples . 
What's next
- To learn how to use the Vertex AI SDK to run Vertex AI RAG Engine tasks, see RAG quickstart for Python .
- To learn about grounding, see Grounding overview .
- To learn more about the responses from RAG, see Retrieval and Generation Output of Vertex AI RAG Engine .
- To learn about the RAG architecture:

