RAG Engine on Gemini Enterprise Agent Platform billing

This page describes the RAG Engine on Gemini Enterprise Agent Platform pricing and billing based on the RAG Engine on Gemini Enterprise Agent Platform components you use, such as models, reranking, and vector storage.

For more information, see the RAG Engine on Gemini Enterprise Agent Platform overview page.

Pricing and billing

This table explains how billing works when you use the RAG components.

Component

How billing works with RAG Engine

Data ingestion

RAG Engine supports ingesting data from different data sources. For example, uploading local files, Cloud Storage, and Google Drive. Accessing files in these data sources from RAG Engine is free, but these data sources might charge for data transfer. For example, data egress costs.

Data transformation (file parsing)

Default parser : Free.
LLM Parser : RAG Engine uses the LLM model that you specified to parse your file, and you will see and pay LLM model costs directly from your project.
Document AI layout parser : RAG Engine uses the Document AI layout parser that you specified to process your file, and you will see and pay for the use of the Document AI layout parser directly from your project.

Data transformation (file chunking)

Supports fixed-size chunking, which is free.

Embedding generation

RAG Engine orchestrates the embedding generation using the embedding model that you specified, and your project is billed for the costs associated with that model.

For more pricing information, see Cost of building and deploying AI models in Gemini Enterprise Agent Platform .

Data indexing and retrieval

RAG Engine supports two categories of vector databases for vector search:

RAG-managed database
Bring-Your-Own vector database

A RAG-managed database has two purposes:

A RAG-managed database stores RAG resources, such as RAG corpora and RAG files. File contents are excluded.
Upon your choice, embedding indexing and retrieval for vector search.

A RAG-managed database uses a Spanner instance as the backend.

For each of your projects, RAG Engine provisions a customer-specific Google Cloud project and manages RAG-managed resources that are stored in RAG Engine, so that your data is physically isolated.

If you choose the RagManagedDB Basic tier or Scaled tier, RAG Engine provisions a Spanner Enterprise edition instance in the corresponding project:

Basic tier : 100 processing units with backup
Scaled tier : Starting at 1 node (1,000 processing units) and autoscaling up to 10 nodes with backup

If any RAG corpus in your project chooses to use a RAG-managed database for the vector search, you will be charged for the RAG-managed Spanner instance.

RAG Engine surfaces Spanner costs from your corresponding RAG-managed project to your Google Cloud project, so that you can see and pay Spanner instance costs.

For more pricing details on Spanner, see Spanner pricing .

Reranking for RAG Engine on Gemini Enterprise Agent Platform

The following ranking tools are supported post retrieval:

LLM Reranker : RAG Engine uses the LLM model that you specified to rerank the retrieval results, and you will see and pay LLM model costs directly from your project.
Agent Search ranking API : RAG Engine uses the Agent Search ranking API to rerank the retrieval results, and you will see and pay for the Ranking API directly from your project.

Delete RAG Engine

The following code samples demonstrate how to delete a RAG Engine for the Google Cloud console, Python, and REST:

Version 1 (v1) API parameters and code samples .
v1beta1 API parameters and code samples .

What's next

To learn how to use the Vertex AI SDK to run RAG Engine on Gemini Enterprise Agent Platform tasks, see RAG quickstart for Python .
To learn about grounding, see Grounding overview .
To learn more about the responses from RAG, see Retrieval and Generation Output of RAG Engine .
To learn about the RAG architecture:
- Infrastructure for a RAG-capable generative AI application using Agent Platform and Vector Search
- Infrastructure for a RAG-capable generative AI application using Agent Platform and AlloyDB for PostgreSQL .